Re: bit fields && data tearing

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Sun, 7 Sep 2014 16:00:19 -0700

On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
> On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
> > On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
> > > On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
> > > > > Hi James,
> > > > > 
> > > > > On 09/04/2014 10:11 PM, James Bottomley wrote:
> > > > > > On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
> > > > > >> +And there are anti-guarantees:
> > > > > >> +
> > > > > >> + (*) These guarantees do not apply to bitfields, because compilers often
> > > > > >> +     generate code to modify these using non-atomic read-modify-write
> > > > > >> +     sequences.  Do not attempt to use bitfields to synchronize parallel
> > > > > >> +     algorithms.
> > > > > >> +
> > > > > >> + (*) Even in cases where bitfields are protected by locks, all fields
> > > > > >> +     in a given bitfield must be protected by one lock.  If two fields
> > > > > >> +     in a given bitfield are protected by different locks, the compiler's
> > > > > >> +     non-atomic read-modify-write sequences can cause an update to one
> > > > > >> +     field to corrupt the value of an adjacent field.
> > > > > >> +
> > > > > >> + (*) These guarantees apply only to properly aligned and sized scalar
> > > > > >> +     variables.  "Properly sized" currently means "int" and "long",
> > > > > >> +     because some CPU families do not support loads and stores of
> > > > > >> +     other sizes.  ("Some CPU families" is currently believed to
> > > > > >> +     be only Alpha 21064.  If this is actually the case, a different
> > > > > >> +     non-guarantee is likely to be formulated.)
> > > > > > 
> > > > > > This is a bit unclear.  Presumably you're talking about definiteness of
> > > > > > the outcome (as in what's seen after multiple stores to the same
> > > > > > variable).
> > > > > 
> > > > > No, the last conditions refers to adjacent byte stores from different
> > > > > cpu contexts (either interrupt or SMP).
> > > > > 
> > > > > > The guarantees are only for natural width on Parisc as well,
> > > > > > so you would get a mess if you did byte stores to adjacent memory
> > > > > > locations.
> > > > > 
> > > > > For a simple test like:
> > > > > 
> > > > > struct x {
> > > > > 	long a;
> > > > > 	char b;
> > > > > 	char c;
> > > > > 	char d;
> > > > > 	char e;
> > > > > };
> > > > > 
> > > > > void store_bc(struct x *p) {
> > > > > 	p->b = 1;
> > > > > 	p->c = 2;
> > > > > }
> > > > > 
> > > > > on parisc, gcc generates separate byte stores
> > > > > 
> > > > > void store_bc(struct x *p) {
> > > > >    0:	34 1c 00 02 	ldi 1,ret0
> > > > >    4:	0f 5c 12 08 	stb ret0,4(r26)
> > > > >    8:	34 1c 00 04 	ldi 2,ret0
> > > > >    c:	e8 40 c0 00 	bv r0(rp)
> > > > >   10:	0f 5c 12 0a 	stb ret0,5(r26)
> > > > > 
> > > > > which appears to confirm that on parisc adjacent byte data
> > > > > is safe from corruption by concurrent cpu updates; that is,
> > > > > 
> > > > > CPU 0                | CPU 1
> > > > >                      |
> > > > > p->b = 1             | p->c = 2
> > > > >                      |
> > > > > 
> > > > > will result in p->b == 1 && p->c == 2 (assume both values
> > > > > were 0 before the call to store_bc()).
> > > > 
> > > > What Peter said.  I would ask for suggestions for better wording, but
> > > > I would much rather be able to say that single-byte reads and writes
> > > > are atomic and that aligned-short reads and writes are also atomic.
> > > > 
> > > > Thus far, it looks like we lose only very old Alpha systems, so unless
> > > > I hear otherwise, I update my patch to outlaw these very old systems.
> > > 
> > > This isn't universally true according to the architecture manual.  The
> > > PARISC CPU can make byte to long word stores atomic against the memory
> > > bus but not against the I/O bus for instance.  Atomicity is a property
> > > of the underlying substrate, not of the CPU.  Implying that atomicity is
> > > a CPU property is incorrect.
> > 
> > OK, fair point.
> > 
> > But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
> > not I/O) that do not support single-byte and double-byte stores?
> 
> For aligned access, I believe that's always the case for the memory bus
> (on both 32 and 64 bit systems).  However, it only applies to machine
> instruction loads and stores of the same width..  If you mix the widths
> on the loads and stores, all bets are off.  That means you have to
> beware of the gcc penchant for coalescing loads and stores: if it sees
> two adjacent byte stores it can coalesce them into a short store
> instead ... that screws up the atomicity guarantees.

OK, that means that to make PARISC work reliably, we need to use
ACCESS_ONCE() for loads and stores that could have racing accesses.
If I understand correctly, this will -not- be needed for code guarded
by locks, even with Peter's examples.

So if we have something like this:

	struct foo {
		char a;
		char b;
	};
	struct foo *fp;

then this code would be bad:

	fp->a = 1;
	fp->b = 2;

The reason is (as you say) that GCC would be happy to store 0x0102
(or vice versa, depending on endianness) to the pair.  We instead
need:

	ACCESS_ONCE(fp->a) = 1;
	ACCESS_ONCE(fp->b) = 2;

However, if the code is protected by locks, no problem:

	struct foo {
		spinlock_t lock_a;
		spinlock_t lock_b;
		char a;
		char b;
	};

Then it is OK to do the following:

	spin_lock(fp->lock_a);
	fp->a = 1;
	spin_unlock(fp->lock_a);
	spin_lock(fp->lock_b);
	fp->b = 1;
	spin_unlock(fp->lock_b);

Or even this, assuming ->lock_a precedes ->lock_b in the locking hierarchy:

	spin_lock(fp->lock_a);
	spin_lock(fp->lock_b);
	fp->a = 1;
	fp->b = 1;
	spin_unlock(fp->lock_a);
	spin_unlock(fp->lock_b);

Here gcc might merge the assignments to fp->a and fp->b, but that is OK
because both locks are held, presumably preventing other assignments or
references to fp->a and fp->b.

On the other hand, if either fp->a or fp->b are referenced outside of their
respective locks, even once, then this last code fragment would still need
ACCESS_ONCE() as follows:

	spin_lock(fp->lock_a);
	spin_lock(fp->lock_b);
	ACCESS_ONCE(fp->a) = 1;
	ACCESS_ONCE(fp->b) = 1;
	spin_unlock(fp->lock_a);
	spin_unlock(fp->lock_b);

Does that cover it?  If so, I will update memory-barriers.txt accordingly.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html