Programs contain
sequences of statements, and a naive compiler would execute them exactly in the order as
they are written. But an optimizing compiler is free to reorder the
statements - or even parts of them - if the resulting "net effect" is the same. The
"measure" of the "net effect" is what the standard calls "side effects", and is
accomplished exclusively through accesses (reads and writes) to variables qualified as
volatile. So, as long as all volatile reads and writes
are to the same addresses and in the same order (and writes write the same values), the
program is correct, regardless of other operations in it. (One important point to note
here is, that time duration between consecutive volatile accesses is not considered at
all.)
Unfortunately, there are also operations which are not covered by volatile accesses. An example of this in avr-gcc/avr-libc are the cli() and sei() macros defined in <avr/interrupt.h>, which convert directly to the respective assembler mnemonics through the __asm__() statement. These don't constitute a variable access at all, not even volatile, so the compiler is free to move them around. Although there is a "volatile" qualifier which can be attached to the __asm__() statement, its effect on (re)ordering is not clear from the documentation (and is more likely only to prevent complete removal by the optimiser), as it (among other) states:
Note that even a volatile asm instruction can be moved relative to other code, including across jump instructions. [...] Similarly, you can't expect a sequence of volatile asm instructions to remain perfectly consecutive.
asm statement, and ensures
that all variables are flushed from registers to memory before the statement, and then
re-read after the statement. The purpose of memory barriers is slightly different than to
enforce code ordering: it is supposed to ensure that there are no variables "cached" in
registers, so that it is safe to change the content of registers e.g. when switching
context in a multitasking OS (on "big" processors with out-of-order execution they also
imply usage of special instructions which force the processor into "in-order" state (this
is not the case of AVRs)).However, memory barrier works well in ensuring that all volatile accesses before and after the barrier occur in the given order with respect to the barrier. However, it does not ensure the compiler moving non-volatile-related statements across the barrier. Peter Dannegger provided a nice example of this effect:
#define cli() __asm volatile( "cli" ::: "memory" ) #define sei() __asm volatile( "sei" ::: "memory" ) unsigned int ivar; void test2( unsigned int val ) { val = 65535U / val; cli(); ivar = val; sei(); }
compiles with optimisations switched on (-Os) to
00000112 <test2>: 112: bc 01 movw r22, r24 114: f8 94 cli 116: 8f ef ldi
r24, 0xFF ; 255 118: 9f ef ldi r25, 0xFF ; 255 11a: 0e 94 96 00 call 0x12c ; 0x12c
<__udivmodhi4> 11e: 70 93 01 02 sts 0x0201, r23 122: 60 93 00 02 sts 0x0200, r22
126: 78 94 sei 128: 08 95 ret
where the potentially slow division is moved across cli(), resulting in interrupts to be disabled longer than intended. Note, that the volatile access occurs in order with respect to cli() or sei(); so the "net effect" required by the standard is achieved as intended, it is "only" the timing which is off. However, for most of embedded applications, timing is an important, sometimes critical factor.
To sum it up:
memory barriers ensure proper ordering of volatile accesses
memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier