Re: SMP barriers semantics

Nick Piggin <npiggin@xxxxxxx> · Mon, 22 Mar 2010 23:02:03 +1100

On Wed, Mar 17, 2010 at 01:42:43PM +0000, Jamie Lokier wrote:
> Benjamin Herrenschmidt wrote:
> > Maybe we can agree on a set of relaxed accessors defined specifically
> > with those semantics (more relaxed implies use of raw_*) :
> > 
> >   - order is guaranteed between MMIOs
> >   - no order is guaranteed between MMIOs and spinlocks
> 
> No order between MMIOs and spinlocks will be fun :-)

There isn't anyway, and things are pretty messed up in this area
already. We have mmiowb. Some architectures do a bit of mmio vs
lock synchronization. Eg. powerpc. But it doesn't do barriers on
some other locks like bit spinlocks or mutexes or rwsems or
semaphores blah blah.

When this came up I grepped a couple of drivers and found possible
problems straight away.

So IMO, we need to take all these out of lock primitives and just
increase awareness of it. Get rid of mmiowb. wmb() should be enough
to keep mmio stores inside the store to drop any lock (by definition).

Actually I think having an io_acquire_barrier() / io_release_barrier()
for the purpose of keeping ios inside locks is a good idea (paired
inside the actual lock/unlock functions). This basically gives them
a self-documenting property.

> >   - a read access is not guaranteed to be performed until the read value
> >     is actually consumed by the processor
> 
> How do you define 'consumed'?  It's not obvious: see
> read_barrier_depends() on Alpha, and elsewhere speculative read
> optimisations (including by the compiler on IA64, in theory).
> 
> > Along with barriers along the line of (that's where we may want to
> > discuss a bit I believe)
> > 
> >   - io_to_mem_rbarrier() : orders MMIO read vs. subsequent memory read
> >   - mem_to_io_wbarrier() : order memory write vs. subsequent MMIO write
> >                            (includes spin_unlock ?)
> >   - io_to_mem_barrier()  : order any MMIO s. subsequent memory accesses
> >   - mem_to_io_barrier()  : order memory accesses vs any subsequent MMIO
> >   - flush_io_read(val)   : takes value from MMIO read, enforce it's
> >                            before completed further instructions are
> >                            issued, acts as compiler&execution barrier. 
> > 
> > What do you guys think ? I think io_to_mem_rbarrier() and
> > mem_to_io_wbarrier() cover most useful cases, except maybe the IO vs.
> > locks.
> 
> I'm thinking the current scheme with "heavy" barriers that serialise
> everything is better - because it's simple to understand.  I think the
> above set are likely to be accidentally misused, and even pass review
> occasionally when they are wrong, even if it's just due to brain farts.
> 
> The "heavy" barrier names could change from rmb/mb/wmb to
> io_rmb/io_mb/io_wmb or something.
> 
> Is there any real performance advantage expected from more
> fine-grained MMIO barriers?
> 
> Another approach is a flags argument to a single macro:
> barrier(BARRIER_MMIO_READ_BEFORE | BARRIER_MEM_READ_AFTER), etc.

This is just syntax really. Like you I also prefer sticking to simpler
barriers. I would like to see any new barrier introduced on a case by
case basis with before/after numbers and allow discussion on whether it
can be avoided.

So many cases people get the simple barriers wrong...

--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html