From: James Bottomley > Sent: 28 September 2015 16:12 > > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. > > > > > > That's different: it's an atomic RMW operation. The problem with the > > > alpha was that the operation wasn't atomic (meaning that it can't be > > > interrupted and no intermediate output states are visible). > > > > It is only atomic if prefixed by the 'lock' prefix. > > Normally the read and write are separate bus cycles. > > The essential point is that x86 has atomic bit ops and byte writes. > Early alpha did not. Early alpha didn't have any byte accesses. On x86 if you have the following: struct { char a; volatile char b; } *foo; foo->a |= 4; The compiler is likely to generate a 'bis #4, 0(rbx)' (or similar) and the cpu will do two 32bit memory cycles that read and write the 'volatile' field 'b'. (gcc definitely used to do this...) A lot of fields were made 32bit (and probably not bitfields) in the linux kernel tree a year or two ago to avoid this very problem. > > > > You still have to ensure the compiler doesn't do wider rmw cycles. > > > > I believe the recent versions of gcc won't do wider accesses for volatile data. > > > > > > I don't understand this comment. You seem to be implying gcc would do a > > > 64 bit RMW for a 32 bit store ... that would be daft when a single > > > instruction exists to perform the operation on all architectures. > > > > Read the object code and weep... > > It is most likely to happen for operations that are rmw (eg bit set). > > For instance the arm cpu has limited offsets for 16bit accesses, for > > normal structures the compiler is likely to use a 32bit rmw sequence > > for a 16bit field that has a large offset. > > The C language allows the compiler to do it for any access (IIRC including > > volatiles). > > I think you might be confusing different things. Most RISC CPUs can't > do 32 bit store immediates because there aren't enough bits in their > arsenal, so they tend to split 32 bit loads into a left and right part > (first the top then the offset). This (and other things) are mostly > what you see in code. However, 32 bit register stores are still atomic, > which is all we require. It's not really the compiler's fault, it's > mostly an architectural limitation. No, I'm not talking about how 32bit constants are generated. I'm talking about structure offsets. David ��.n��������+%������w��{.n�����{����^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�