On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote: > I'm confused why storing 0x0102 would be a problem. I think gcc does that even on other cpus. > > More atomicity can't hurt, can it? I must defer to James for any additional details on why PARISC systems don't provide atomicity for partially overlapping stores. ;-) Thanx, Paul > On September 7, 2014 4:00:19 PM PDT, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > >On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote: > >> On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote: > >> > On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote: > >> > > On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote: > >> > > > On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote: > >> > > > > Hi James, > >> > > > > > >> > > > > On 09/04/2014 10:11 PM, James Bottomley wrote: > >> > > > > > On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote: > >> > > > > >> +And there are anti-guarantees: > >> > > > > >> + > >> > > > > >> + (*) These guarantees do not apply to bitfields, because > >compilers often > >> > > > > >> + generate code to modify these using non-atomic > >read-modify-write > >> > > > > >> + sequences. Do not attempt to use bitfields to > >synchronize parallel > >> > > > > >> + algorithms. > >> > > > > >> + > >> > > > > >> + (*) Even in cases where bitfields are protected by > >locks, all fields > >> > > > > >> + in a given bitfield must be protected by one lock. > >If two fields > >> > > > > >> + in a given bitfield are protected by different > >locks, the compiler's > >> > > > > >> + non-atomic read-modify-write sequences can cause an > >update to one > >> > > > > >> + field to corrupt the value of an adjacent field. > >> > > > > >> + > >> > > > > >> + (*) These guarantees apply only to properly aligned and > >sized scalar > >> > > > > >> + variables. "Properly sized" currently means "int" > >and "long", > >> > > > > >> + because some CPU families do not support loads and > >stores of > >> > > > > >> + other sizes. ("Some CPU families" is currently > >believed to > >> > > > > >> + be only Alpha 21064. If this is actually the case, > >a different > >> > > > > >> + non-guarantee is likely to be formulated.) > >> > > > > > > >> > > > > > This is a bit unclear. Presumably you're talking about > >definiteness of > >> > > > > > the outcome (as in what's seen after multiple stores to the > >same > >> > > > > > variable). > >> > > > > > >> > > > > No, the last conditions refers to adjacent byte stores from > >different > >> > > > > cpu contexts (either interrupt or SMP). > >> > > > > > >> > > > > > The guarantees are only for natural width on Parisc as > >well, > >> > > > > > so you would get a mess if you did byte stores to adjacent > >memory > >> > > > > > locations. > >> > > > > > >> > > > > For a simple test like: > >> > > > > > >> > > > > struct x { > >> > > > > long a; > >> > > > > char b; > >> > > > > char c; > >> > > > > char d; > >> > > > > char e; > >> > > > > }; > >> > > > > > >> > > > > void store_bc(struct x *p) { > >> > > > > p->b = 1; > >> > > > > p->c = 2; > >> > > > > } > >> > > > > > >> > > > > on parisc, gcc generates separate byte stores > >> > > > > > >> > > > > void store_bc(struct x *p) { > >> > > > > 0: 34 1c 00 02 ldi 1,ret0 > >> > > > > 4: 0f 5c 12 08 stb ret0,4(r26) > >> > > > > 8: 34 1c 00 04 ldi 2,ret0 > >> > > > > c: e8 40 c0 00 bv r0(rp) > >> > > > > 10: 0f 5c 12 0a stb ret0,5(r26) > >> > > > > > >> > > > > which appears to confirm that on parisc adjacent byte data > >> > > > > is safe from corruption by concurrent cpu updates; that is, > >> > > > > > >> > > > > CPU 0 | CPU 1 > >> > > > > | > >> > > > > p->b = 1 | p->c = 2 > >> > > > > | > >> > > > > > >> > > > > will result in p->b == 1 && p->c == 2 (assume both values > >> > > > > were 0 before the call to store_bc()). > >> > > > > >> > > > What Peter said. I would ask for suggestions for better > >wording, but > >> > > > I would much rather be able to say that single-byte reads and > >writes > >> > > > are atomic and that aligned-short reads and writes are also > >atomic. > >> > > > > >> > > > Thus far, it looks like we lose only very old Alpha systems, so > >unless > >> > > > I hear otherwise, I update my patch to outlaw these very old > >systems. > >> > > > >> > > This isn't universally true according to the architecture manual. > > The > >> > > PARISC CPU can make byte to long word stores atomic against the > >memory > >> > > bus but not against the I/O bus for instance. Atomicity is a > >property > >> > > of the underlying substrate, not of the CPU. Implying that > >atomicity is > >> > > a CPU property is incorrect. > >> > > >> > OK, fair point. > >> > > >> > But are there in-use-for-Linux PARISC memory fabrics (for normal > >memory, > >> > not I/O) that do not support single-byte and double-byte stores? > >> > >> For aligned access, I believe that's always the case for the memory > >bus > >> (on both 32 and 64 bit systems). However, it only applies to machine > >> instruction loads and stores of the same width.. If you mix the > >widths > >> on the loads and stores, all bets are off. That means you have to > >> beware of the gcc penchant for coalescing loads and stores: if it > >sees > >> two adjacent byte stores it can coalesce them into a short store > >> instead ... that screws up the atomicity guarantees. > > > >OK, that means that to make PARISC work reliably, we need to use > >ACCESS_ONCE() for loads and stores that could have racing accesses. > >If I understand correctly, this will -not- be needed for code guarded > >by locks, even with Peter's examples. > > > >So if we have something like this: > > > > struct foo { > > char a; > > char b; > > }; > > struct foo *fp; > > > >then this code would be bad: > > > > fp->a = 1; > > fp->b = 2; > > > >The reason is (as you say) that GCC would be happy to store 0x0102 > >(or vice versa, depending on endianness) to the pair. We instead > >need: > > > > ACCESS_ONCE(fp->a) = 1; > > ACCESS_ONCE(fp->b) = 2; > > > >However, if the code is protected by locks, no problem: > > > > struct foo { > > spinlock_t lock_a; > > spinlock_t lock_b; > > char a; > > char b; > > }; > > > >Then it is OK to do the following: > > > > spin_lock(fp->lock_a); > > fp->a = 1; > > spin_unlock(fp->lock_a); > > spin_lock(fp->lock_b); > > fp->b = 1; > > spin_unlock(fp->lock_b); > > > >Or even this, assuming ->lock_a precedes ->lock_b in the locking > >hierarchy: > > > > spin_lock(fp->lock_a); > > spin_lock(fp->lock_b); > > fp->a = 1; > > fp->b = 1; > > spin_unlock(fp->lock_a); > > spin_unlock(fp->lock_b); > > > >Here gcc might merge the assignments to fp->a and fp->b, but that is OK > >because both locks are held, presumably preventing other assignments or > >references to fp->a and fp->b. > > > >On the other hand, if either fp->a or fp->b are referenced outside of > >their > >respective locks, even once, then this last code fragment would still > >need > >ACCESS_ONCE() as follows: > > > > spin_lock(fp->lock_a); > > spin_lock(fp->lock_b); > > ACCESS_ONCE(fp->a) = 1; > > ACCESS_ONCE(fp->b) = 1; > > spin_unlock(fp->lock_a); > > spin_unlock(fp->lock_b); > > > >Does that cover it? If so, I will update memory-barriers.txt > >accordingly. > > > > Thanx, Paul > > -- > Sent from my mobile phone. Please pardon brevity and lack of formatting. > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html