On Thu, Oct 31, 2019 at 10:08:00AM +0000, Lai Jiangshan wrote: > rcu_preempt_deferred_qs_irqrestore() is always called when > ->rcu_read_lock_nesting <= 0 and there is nothing to prevent it > from reporting qs if needed which means ->rcu_read_unlock_special > is better to be clearred after the function. But in some cases, > it leaves exp_hint (for example, after rcu_note_context_switch()), > which is harmess since it is just a hint, but it may also intruduce > unneeded rcu_read_unlock_special() later. > > The new code write to exp_hint without WRITE_ONCE() since the > writing is protected by irq. > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> This does look good, thank you! Though for a rather different reason that called out in the commit log. Note that .exp_hint is in task_struct, and thus follows the task, and is currently unconditionally cleared by the next rcu_read_unlock_special(), which will be invoked because .exp_hint is non-zero. So if rcu_note_context_switch() leaves .exp_hint set, that is all to the good because the next rcu_read_unlock() executed by this task once resumed, and because that rcu_read_unlock() might be transitioning to a quiescent state required in order for the expedited grace period to end. Nevertheless, clearing .exp_hint in rcu_preempt_deferred_qs_irqrestore() instead of rcu_read_unlock_special() is a good thing. This is because in the case where it is not possible to arrange an event immediately following reenabling, it is good to enable any rcu_read_unlock()s that might be executed to help us out. Or am I missing something here? > --- > kernel/rcu/tree_plugin.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 9fe8138ed3c3..bb3bcdb5c9b8 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) > * t->rcu_read_unlock_special cannot change. > */ > special = t->rcu_read_unlock_special; > + t->rcu_read_unlock_special.b.exp_hint = false; Interrupts are disabled by this time, so yes, it is safe for this to be a simple assignment. > t->rcu_read_unlock_special.b.deferred_qs = false; > if (special.b.need_qs) { > rcu_qs(); > @@ -626,7 +627,6 @@ static void rcu_read_unlock_special(struct task_struct *t) > local_irq_restore(flags); > return; > } > - WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); And reaching the above assignment is inevitable at this point, so this assignment can go. But can't we also get rid of the assignment under the preceding "if" statement? Please see below for a version of your patch that includes this proposed change along with an updated commit log. Please let me know if I have messed anything up. > rcu_preempt_deferred_qs_irqrestore(t, flags); > } ------------------------------------------------------------------------ commit 6e3f2b1d6aba24ad6ae8deb2ce98b93d527227ae Author: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> Date: Fri Nov 1 04:06:22 2019 -0700 rcu: Clear .exp_hint only when deferred quiescent state has been reported Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(), which works, but which can also prevent subsequent rcu_read_unlock() calls from helping expedite the quiescent state needed by an ongoing expedited RCU grace period. This commit therefore defers clearing of .exp_hint from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(), thus ensuring that intervening calls to rcu_read_unlock() have a chance to help end the expedited grace period. Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d4c4824..677ee1c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) * t->rcu_read_unlock_special cannot change. */ special = t->rcu_read_unlock_special; + t->rcu_read_unlock_special.b.exp_hint = false; rdp = this_cpu_ptr(&rcu_data); if (!special.s && !rdp->exp_deferred_qs) { local_irq_restore(flags); @@ -610,7 +611,6 @@ static void rcu_read_unlock_special(struct task_struct *t) struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; - t->rcu_read_unlock_special.b.exp_hint = false; exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || (rdp->grpmask & rnp->expmask) || tick_nohz_full_cpu(rdp->cpu); @@ -640,7 +640,6 @@ static void rcu_read_unlock_special(struct task_struct *t) local_irq_restore(flags); return; } - WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); rcu_preempt_deferred_qs_irqrestore(t, flags); }