CPU to restrict the order.
Memory barriers are such interventions. They impose a perceived partial
-ordering between the memory operations specified on either side of the barrier.
-They request that the sequence of memory events generated appears to other
-parts of the system as if the barrier is effective on that CPU.
+ordering over the memory operations on either side of the barrier.
+
+Such enforcement is important because the CPUs and other devices in a system
+can use a variety of tricks to improve performance - including reordering,
+deferral and combination of memory operations; speculative loads; speculative
+branch prediction and various types of caching. Memory barriers are used to
+override or suppress these tricks, allowing the code to sanely control the
+interaction of multiple CPUs and/or devices.
VARIETIES OF MEMORY BARRIER
A write barrier is a partial ordering on stores only; it is not required
to have any effect on loads.
- A CPU can be viewed as as commiting a sequence of store operations to the
+ A CPU can be viewed as committing a sequence of store operations to the
memory system as time progresses. All stores before a write barrier will
occur in the sequence _before_ all the stores after the write barrier.
indirect effect will be the order in which the second CPU sees the effects
of the first CPU's accesses occur, but see the next point:
- (*) There is no guarantee that the a CPU will see the correct order of effects
+ (*) There is no guarantee that a CPU will see the correct order of effects
from a second CPU's accesses, even _if_ the second CPU uses a memory
barrier, unless the first CPU _also_ uses a matching memory barrier (see
the subsection on "SMP Barrier Pairing").
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
Alpha).
-To deal with this, a data dependency barrier must be inserted between the
-address load and the data load:
+To deal with this, a data dependency barrier or better must be inserted
+between the address load and the data load:
CPU 1 CPU 2
=============== ===============
variable B might be stored in an even-numbered cache line. Then, if the
even-numbered bank of the reading CPU's cache is extremely busy while the
odd-numbered bank is idle, one can see the new value of the pointer P (&B),
-but the old value of the variable B (1).
+but the old value of the variable B (2).
Another example of where data dependency barriers might by required is where a
This sequence of events is committed to the memory coherence system in an order
that the rest of the system might perceive as the unordered set of { STORE A,
-STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
+STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
}:
+-------+ : :
: :
-If, however, a read barrier were to be placed between the load of E and the
+If, however, a read barrier were to be placed between the load of B and the
load of A on CPU 2:
CPU 1 CPU 2
There are some more advanced barrier functions:
(*) set_mb(var, value)
- (*) set_wmb(var, value)
- These assign the value to the variable and then insert at least a write
- barrier after it, depending on the function. They aren't guaranteed to
+ This assigns the value to the variable and then inserts at least a write
+ barrier after it, depending on the function. It isn't guaranteed to
insert anything more than a compiler barrier in a UP compilation.
On a UP system - where this wouldn't be a problem - the smp_mb() is just a
compiler barrier, thus making sure the compiler emits the instructions in the
-right order without actually intervening in the CPU. Since there there's only
-one CPU, that CPU's dependency ordering logic will take care of everything
-else.
+right order without actually intervening in the CPU. Since there's only one
+CPU, that CPU's dependency ordering logic will take care of everything else.
ATOMIC OPERATIONS
The PCI bus, amongst others, defines an I/O space concept - which on such
CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
- space. However, it may also mapped as a virtual I/O space in the CPU's
- memory map, particularly on those CPUs that don't support alternate
- I/O spaces.
+ space. However, it may also be mapped as a virtual I/O space in the CPU's
+ memory map, particularly on those CPUs that don't support alternate I/O
+ spaces.
Accesses to this space may be fully synchronous (as on i386), but
intermediary bridges (such as the PCI host bridge) may not fully honour