Re: [PATCH 17/17] RCU'd vfsmounts

Josh Triplett <josh@xxxxxxxxxxxxxxxx> · Thu, 3 Oct 2013 16:28:27 -0700

On Thu, Oct 03, 2013 at 01:52:45PM -0700, Linus Torvalds wrote:
> On Thu, Oct 3, 2013 at 1:41 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > The problem is this:
> > A = 1, B = 1
> > CPU1:
> > A = 0
> > <full barrier>
> > synchronize_rcu()
> > read B
> >
> > CPU2:
> > rcu_read_lock()
> > B = 0
> > read A
> >
> > Are we guaranteed that we won't get both of them seeing ones, in situation
> > when that rcu_read_lock() comes too late to be noticed by synchronize_rcu()?
> 
> Yeah, I think we should be guaranteed that, because the
> synchronize_rcu() will guarantee that all other CPU's go through an
> idle period. So the "read A" on CPU2 cannot possibly see a 1 _unless_
> it happens so early that synchronize_rcu() definitely sees it (ie it's
> a "preexisting reader" by definition), in which case synchronize_rcu()
> will be waiting for a subsequent idle period, in which case the B=0 on
> CPU2 is not only guaranteed to happen but also be visible out, so the
> "read B" on CPU1 will see 0. And that's true even if CPU2 doesn't have
> an explicit memory barrier, because the "RCU idle" state implies that
> it has gone through a barrier.

I think the reasoning in one direction is actually quite a bit less
obvious than that.

rcu_read_unlock() does *not* necessarily imply a memory barrier (so the
B=0 can actually move logically outside the rcu_read_unlock()), but
synchronize_rcu() *does* imply (and enforce) that a memory barrier has
occurred on all CPUs as part of quiescence.  However, likewise,
rcu_read_lock() doesn't imply anything in particular about writes; it
does enforce either that reads can't leak earlier or that if they do a
synchronize_rcu() will still wait for them, but I don't think the safety
interaction between a *write* in the RCU reader and a *read* in the RCU
writer necessarily follows from that enforcement.

(Also, to the best of my knowledge, you don't even need a barrier on
CPU1; synchronize_rcu() should imply one.)

If synchronize_rcu() on CPU1 sees rcu_read_lock() on CPU2, then
synchronize_rcu() will wait for CPU2's read-side critical section and a
memory barrier before reading B, so CPU1 will see B==0.

The harder direction: If synchronize_rcu() on CPU1 does not see
rcu_read_lock() on CPU2, then it won't necessarily wait for anything,
and since rcu_read_lock() itself does not imply any CPU write barriers,
it's not at all obvious that rcu_read_lock() prevents B=0 from occurring
before CPU1's read of B.

In short, the interaction between RCU's ordering guarantees and CPU
memory barriers in the presence of writes on the read side and reads on
the write side does not seem sufficiently clear to support the portable
use of the above pattern without an smp_wmb() on CPU2 between
rcu_read_lock() and B=0.  I think it might happen to work with the
current implementations of RCU (with which synchronize_rcu() won't
actually notice a quiescent state and return until after either
the rcu_read_unlock() or a preemption point), but by the strict semantic
guarantees of the RCU primitives I think you could write a legitimate
RCU implementation that would break the above code.

That said, I believe this pattern *will* work with every existing
implementation of RCU.  Thus, I'd suggest documenting it as a warning to
prospective RCU optimizers to avoid breaking the above pattern.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html