Re: Behaviour of smp_mb__{before,after}_spin* and acquire/release

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Fri, 23 Jan 2015 13:21:44 -0800

On Fri, Jan 23, 2015 at 02:08:32PM +0000, Will Deacon wrote:
> On Tue, Jan 20, 2015 at 09:35:45PM +0000, Paul E. McKenney wrote:
> > On Tue, Jan 20, 2015 at 10:34:43AM +0100, Peter Zijlstra wrote:
> > > On Tue, Jan 13, 2015 at 04:33:54PM +0000, Will Deacon wrote:
> > > > I started dusting off a series I've been working to implement a relaxed
> > > > atomic API in Linux (i.e. things like atomic_read(v, ACQUIRE)) but I'm
> > > > having trouble making sense of the ordering semantics we have in mainline
> > > > today:
> > > 
> > > >   2. Does smp_mb__after_unlock_lock order smp_store_release against
> > > >      smp_load_acquire? Again, Documentation/memory-barriers.txt puts
> > > >      these operations into the RELEASE and ACQUIRE classes respectively,
> > > >      but since smp_mb__after_unlock_lock is a NOP everywhere other than
> > > >      PowerPC, I don't think this is enforced by the current code. 
> > > 
> > > Yeah, wasn't Paul going to talk to Ben about that? PPC is the only arch
> > > that has the weak ACQUIRE/RELEASE for its spinlocks.
> > 
> > I thought that you guys were going to propose something and we would see
> > what the reaction was.  ;-)
> 
> Well, ripping it out is quite easy (patch below) :)
> 
> I didn't fully grok the comment in the MCS code, since mcs_lock uses an
> xchg, which does have full barrier semantics and therefore mcs_unlock +
> mcs_lock also has full barrier semantics afaict.
> 
> The remaining problem it that this puts PowerPC back into its original state
> (before the barrier) where UNLOCK + LOCK doesn't imply a full barrier, so
> that would need to be fixed.
> 
> Anyway, kicking this out there for comments.

I believe that Ben is still on holiday, so might be a bit.

							Thanx, Paul

> Will
> 
> --->8
> 
> >From 23d1db44f20a12924d2addeb73231f10d90e9f05 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.deacon@xxxxxxx>
> Date: Fri, 23 Jan 2015 13:55:34 +0000
> Subject: [PATCH] memory-barriers: remove smp_mb__after_unlock_lock()
> 
> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> into a full memory barrier.
> 
> However:
> 
>   - This ordering guarantee is already provided without the barrier on
>     all architectures apart from PowerPC
> 
>   - The barrier only applies to UNLOCK + LOCK, not general
>     RELEASE + ACQUIRE operations
> 
>   - The barrier is not well used outside of RCU and, because it was
>     retrofitted into the kernel, it's not clear whether other areas of
>     the kernel are incorrectly relying on UNLOCK + LOCK implying a full
>     barrier
> 
> This patch removes the barrier and instead requires architectures to
> provide full barrier semantics for an UNLOCK + LOCK sequence.
> 
> TODO: add a barrier to PowerPC unlock?
> 
> Signed-off-by: Will Deacon <will.deacon@xxxxxxx>
> ---
>  Documentation/memory-barriers.txt   | 77 ++-----------------------------------
>  arch/powerpc/include/asm/spinlock.h |  2 -
>  include/linux/spinlock.h            | 10 -----
>  kernel/locking/mcs_spinlock.h       |  9 -----
>  kernel/rcu/tree.c                   | 19 +--------
>  kernel/rcu/tree_plugin.h            | 13 -------
>  6 files changed, 4 insertions(+), 126 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 9c0e3c45a807..fed47737e4c6 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1777,74 +1777,9 @@ RELEASE are to the same lock variable, but only from the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
> 
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -if either (a) the RELEASE and the ACQUIRE are executed by the same
> -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
> -The smp_mb__after_unlock_lock() primitive is free on many architectures.
> -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
> -sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
> -
> -	*A = a;
> -	RELEASE M
> -	ACQUIRE N
> -	*B = b;
> -
> -could occur as:
> -
> -	ACQUIRE N, STORE *B, STORE *A, RELEASE M
> -
> -It might appear that this reordering could introduce a deadlock.
> -However, this cannot happen because if such a deadlock threatened,
> -the RELEASE would simply complete, thereby avoiding the deadlock.
> -
> -	Why does this work?
> -
> -	One key point is that we are only talking about the CPU doing
> -	the reordering, not the compiler.  If the compiler (or, for
> -	that matter, the developer) switched the operations, deadlock
> -	-could- occur.
> -
> -	But suppose the CPU reordered the operations.  In this case,
> -	the unlock precedes the lock in the assembly code.  The CPU
> -	simply elected to try executing the later lock operation first.
> -	If there is a deadlock, this lock operation will simply spin (or
> -	try to sleep, but more on that later).	The CPU will eventually
> -	execute the unlock operation (which preceded the lock operation
> -	in the assembly code), which will unravel the potential deadlock,
> -	allowing the lock operation to succeed.
> -
> -	But what if the lock is a sleeplock?  In that case, the code will
> -	try to enter the scheduler, where it will eventually encounter
> -	a memory barrier, which will force the earlier unlock operation
> -	to complete, again unraveling the deadlock.  There might be
> -	a sleep-unlock race, but the locking primitive needs to resolve
> -	such races properly in any case.
> -
> -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> -For example, with the following code, the store to *A will always be
> -seen by other CPUs before the store to *B:
> -
> -	*A = a;
> -	RELEASE M
> -	ACQUIRE N
> -	smp_mb__after_unlock_lock();
> -	*B = b;
> -
> -The operations will always occur in one of the following orders:
> -
> -	STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> -	STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> -	ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> -
> -If the RELEASE and ACQUIRE were instead both operating on the same lock
> -variable, only the first of these alternatives can occur.  In addition,
> -the more strongly ordered systems may rule out some of the above orders.
> -But in any case, as noted earlier, the smp_mb__after_unlock_lock()
> -ensures that the store to *A will always be seen as happening before
> -the store to *B.
> +However, the reverse case of a RELEASE followed by an ACQUIRE _does_
> +imply a full memory barrier when these accesses are performed as a pair
> +of UNLOCK and LOCK operations, irrespective of the lock variable.
> 
>  Locks and semaphores may not provide any guarantee of ordering on UP compiled
>  systems, and so cannot be counted on in such a situation to actually achieve
> @@ -2087,7 +2022,6 @@ However, if the following occurs:
>  	RELEASE M	     [1]
>  	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
>  					ACQUIRE M		     [2]
> -					smp_mb__after_unlock_lock();
>  					ACCESS_ONCE(*F) = f;
>  					ACCESS_ONCE(*G) = g;
>  					RELEASE M	     [2]
> @@ -2105,11 +2039,6 @@ But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
>  	*F, *G or *H preceding ACQUIRE M [2]
>  	*A, *B, *C, *E, *F or *G following RELEASE M [2]
> 
> -Note that the smp_mb__after_unlock_lock() is critically important
> -here: Without it CPU 3 might see some of the above orderings.
> -Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
> -to be seen in order unless CPU 3 holds lock M.
> -
> 
>  ACQUIRES VS I/O ACCESSES
>  ------------------------
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 4dbe072eecbe..523673d7583c 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -28,8 +28,6 @@
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> 
> -#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
> -
>  #ifdef CONFIG_PPC64
>  /* use 0x800000yy when locked, where yy == CPU number */
>  #ifdef __BIG_ENDIAN__
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 262ba4ef9a8e..9e8ba39640ce 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -130,16 +130,6 @@ do {								\
>  #define smp_mb__before_spinlock()	smp_wmb()
>  #endif
> 
> -/*
> - * Place this after a lock-acquisition primitive to guarantee that
> - * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
> - * if the UNLOCK and LOCK are executed by the same CPU or if the
> - * UNLOCK and LOCK operate on the same lock variable.
> - */
> -#ifndef smp_mb__after_unlock_lock
> -#define smp_mb__after_unlock_lock()	do { } while (0)
> -#endif
> -
>  /**
>   * raw_spin_unlock_wait - wait until the spinlock gets unlocked
>   * @lock: the spinlock in question.
> diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
> index 4d60986fcbee..6d60c9f7dc87 100644
> --- a/kernel/locking/mcs_spinlock.h
> +++ b/kernel/locking/mcs_spinlock.h
> @@ -42,15 +42,6 @@ do {									\
>  #endif
> 
>  /*
> - * Note: the smp_load_acquire/smp_store_release pair is not
> - * sufficient to form a full memory barrier across
> - * cpus for many architectures (except x86) for mcs_unlock and mcs_lock.
> - * For applications that need a full barrier across multiple cpus
> - * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be
> - * used after mcs_lock.
> - */
> -
> -/*
>   * In order to acquire the lock, the caller should declare a local node and
>   * pass a reference of the node to this function in addition to the lock.
>   * If the lock has already been acquired, then this will proceed to spin
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 7680fc275036..86595ead34ec 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1322,10 +1322,8 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * hold it, acquire the root rcu_node structure's lock in order to
>  	 * start one (if needed).
>  	 */
> -	if (rnp != rnp_root) {
> +	if (rnp != rnp_root)
>  		raw_spin_lock(&rnp_root->lock);
> -		smp_mb__after_unlock_lock();
> -	}
> 
>  	/*
>  	 * Get a new grace-period number.  If there really is no grace
> @@ -1574,7 +1572,6 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp)
>  		local_irq_restore(flags);
>  		return;
>  	}
> -	smp_mb__after_unlock_lock();
>  	needwake = __note_gp_changes(rsp, rnp, rdp);
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	if (needwake)
> @@ -1591,7 +1588,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
> 
>  	rcu_bind_gp_kthread();
>  	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock();
>  	if (!ACCESS_ONCE(rsp->gp_flags)) {
>  		/* Spurious wakeup, tell caller to go back to sleep.  */
>  		raw_spin_unlock_irq(&rnp->lock);
> @@ -1617,7 +1613,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
> 
>  	/* Exclude any concurrent CPU-hotplug operations. */
>  	mutex_lock(&rsp->onoff_mutex);
> -	smp_mb__after_unlock_lock(); /* ->gpnum increment before GP! */
> 
>  	/*
>  	 * Set the quiescent-state-needed bits in all the rcu_node
> @@ -1634,7 +1629,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
>  	 */
>  	rcu_for_each_node_breadth_first(rsp, rnp) {
>  		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
>  		rdp = this_cpu_ptr(rsp->rda);
>  		rcu_preempt_check_blocked_tasks(rnp);
>  		rnp->qsmask = rnp->qsmaskinit;
> @@ -1684,7 +1678,6 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
>  	/* Clear flag to prevent immediate re-entry. */
>  	if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
>  		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
>  		ACCESS_ONCE(rsp->gp_flags) =
>  			ACCESS_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS;
>  		raw_spin_unlock_irq(&rnp->lock);
> @@ -1704,7 +1697,6 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	struct rcu_node *rnp = rcu_get_root(rsp);
> 
>  	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock();
>  	gp_duration = jiffies - rsp->gp_start;
>  	if (gp_duration > rsp->gp_max)
>  		rsp->gp_max = gp_duration;
> @@ -1730,7 +1722,6 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	 */
>  	rcu_for_each_node_breadth_first(rsp, rnp) {
>  		raw_spin_lock_irq(&rnp->lock);
> -		smp_mb__after_unlock_lock();
>  		ACCESS_ONCE(rnp->completed) = rsp->gpnum;
>  		rdp = this_cpu_ptr(rsp->rda);
>  		if (rnp == rdp->mynode)
> @@ -1742,7 +1733,6 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	}
>  	rnp = rcu_get_root(rsp);
>  	raw_spin_lock_irq(&rnp->lock);
> -	smp_mb__after_unlock_lock(); /* Order GP before ->completed update. */
>  	rcu_nocb_gp_set(rnp, nocb);
> 
>  	/* Declare grace period done. */
> @@ -1978,7 +1968,6 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
>  		rnp_c = rnp;
>  		rnp = rnp->parent;
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
>  		WARN_ON_ONCE(rnp_c->qsmask);
>  	}
> 
> @@ -2009,7 +1998,6 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp)
> 
>  	rnp = rdp->mynode;
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	if (rdp->passed_quiesce == 0 || rdp->gpnum != rnp->gpnum ||
>  	    rnp->completed == rnp->gpnum) {
> 
> @@ -2224,7 +2212,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
>  	mask = rdp->grpmask;	/* rnp->grplo is constant. */
>  	do {
>  		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
> -		smp_mb__after_unlock_lock();
>  		rnp->qsmaskinit &= ~mask;
>  		if (rnp->qsmaskinit != 0) {
>  			if (rnp != rdp->mynode)
> @@ -2437,7 +2424,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  		cond_resched_rcu_qs();
>  		mask = 0;
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
>  		if (!rcu_gp_in_progress(rsp)) {
>  			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			return;
> @@ -2467,7 +2453,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  	rnp = rcu_get_root(rsp);
>  	if (rnp->qsmask == 0) {
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
>  		rcu_initiate_boost(rnp, flags); /* releases rnp->lock. */
>  	}
>  }
> @@ -2500,7 +2485,6 @@ static void force_quiescent_state(struct rcu_state *rsp)
> 
>  	/* Reached the root of the rcu_node tree, acquire lock. */
>  	raw_spin_lock_irqsave(&rnp_old->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	raw_spin_unlock(&rnp_old->fqslock);
>  	if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
>  		rsp->n_force_qs_lh++;
> @@ -2625,7 +2609,6 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
>  			struct rcu_node *rnp_root = rcu_get_root(rsp);
> 
>  			raw_spin_lock(&rnp_root->lock);
> -			smp_mb__after_unlock_lock();
>  			needwake = rcu_start_gp(rsp);
>  			raw_spin_unlock(&rnp_root->lock);
>  			if (needwake)
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3ec85cb5d544..a89523605545 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -180,7 +180,6 @@ static void rcu_preempt_note_context_switch(void)
>  		rdp = this_cpu_ptr(rcu_preempt_state.rda);
>  		rnp = rdp->mynode;
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
>  		t->rcu_read_unlock_special.b.blocked = true;
>  		t->rcu_blocked_node = rnp;
> 
> @@ -287,7 +286,6 @@ static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
>  	mask = rnp->grpmask;
>  	raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
>  	raw_spin_lock(&rnp_p->lock);	/* irqs already disabled. */
> -	smp_mb__after_unlock_lock();
>  	rcu_report_qs_rnp(mask, &rcu_preempt_state, rnp_p, flags);
>  }
> 
> @@ -362,7 +360,6 @@ void rcu_read_unlock_special(struct task_struct *t)
>  		for (;;) {
>  			rnp = t->rcu_blocked_node;
>  			raw_spin_lock(&rnp->lock);  /* irqs already disabled. */
> -			smp_mb__after_unlock_lock();
>  			if (rnp == t->rcu_blocked_node)
>  				break;
>  			raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> @@ -576,7 +573,6 @@ static int rcu_preempt_offline_tasks(struct rcu_state *rsp,
>  	while (!list_empty(lp)) {
>  		t = list_entry(lp->next, typeof(*t), rcu_node_entry);
>  		raw_spin_lock(&rnp_root->lock); /* irqs already disabled */
> -		smp_mb__after_unlock_lock();
>  		list_del(&t->rcu_node_entry);
>  		t->rcu_blocked_node = rnp_root;
>  		list_add(&t->rcu_node_entry, lp_root);
> @@ -601,7 +597,6 @@ static int rcu_preempt_offline_tasks(struct rcu_state *rsp,
>  	 * in this case.
>  	 */
>  	raw_spin_lock(&rnp_root->lock); /* irqs already disabled */
> -	smp_mb__after_unlock_lock();
>  	if (rnp_root->boost_tasks != NULL &&
>  	    rnp_root->boost_tasks != rnp_root->gp_tasks &&
>  	    rnp_root->boost_tasks != rnp_root->exp_tasks)
> @@ -732,7 +727,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
>  	unsigned long mask;
> 
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	for (;;) {
>  		if (!sync_rcu_preempt_exp_done(rnp)) {
>  			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> @@ -750,7 +744,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
>  		raw_spin_unlock(&rnp->lock); /* irqs remain disabled */
>  		rnp = rnp->parent;
>  		raw_spin_lock(&rnp->lock); /* irqs already disabled */
> -		smp_mb__after_unlock_lock();
>  		rnp->expmask &= ~mask;
>  	}
>  }
> @@ -770,7 +763,6 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
>  	int must_wait = 0;
> 
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	if (list_empty(&rnp->blkd_tasks)) {
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	} else {
> @@ -850,7 +842,6 @@ void synchronize_rcu_expedited(void)
>  	/* Initialize ->expmask for all non-leaf rcu_node structures. */
>  	rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) {
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
> -		smp_mb__after_unlock_lock();
>  		rnp->expmask = rnp->qsmaskinit;
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	}
> @@ -1131,7 +1122,6 @@ static int rcu_boost(struct rcu_node *rnp)
>  		return 0;  /* Nothing left to boost. */
> 
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
> 
>  	/*
>  	 * Recheck under the lock: all tasks in need of boosting
> @@ -1323,7 +1313,6 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
>  	if (IS_ERR(t))
>  		return PTR_ERR(t);
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	rnp->boost_kthread_task = t;
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	sp.sched_priority = kthread_prio;
> @@ -1717,7 +1706,6 @@ static void rcu_prepare_for_idle(void)
>  			continue;
>  		rnp = rdp->mynode;
>  		raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> -		smp_mb__after_unlock_lock();
>  		needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
>  		raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
>  		if (needwake)
> @@ -2226,7 +2214,6 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
>  	struct rcu_node *rnp = rdp->mynode;
> 
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	smp_mb__after_unlock_lock();
>  	needwake = rcu_start_future_gp(rnp, rdp, &c);
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	if (needwake)
> -- 
> 2.1.4
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html