Re: [RFC] rcu/nocb: Fix possible bugs in rcu_barrier()

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Sun, 18 Sep 2022 18:17:21 -0400

On 9/18/2022 6:12 PM, Joel Fernandes (Google) wrote:
> When going through the lazy-rcu work, I noticed that
> rcu_barrier_entrain() does not really wake up the rcuog GP thread in any
> path after entraining. This means it is possible the GP thread is not
> awakened soon (say there were no CBs in the cblist after entraining
> time).
> 
> Further, nothing appears to be calling the rcu_barrier callback
> directly in the case the ->cblist was empty which means if the IPI gets
> delayed enough to make the ->cblist empty and it turns out to be the last
> CPU holding, then nothing calls completes rcu_state.barrier_completion.
> 
> Fix both these issues.
> 
> A note on the wakeup, there are 3 cases AFAICS after the call to
> rcu_nocb_flush_bypass():
> 
> 1. The rdp->cblist has pending CBs.
> 
> 2. The rdp->cblist has all done CBs.
> 
> 3. The rdp->cblist has no CBs at all (say the IPI took a long time to
> arrive and some other path dequeued them in the meanwhile).
> 
> For #3, entraining a CB is not needed and we should bail.  For #1 and
> needed. But for #2 it is needed.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
> I only build tested this and wanted to post it in advance for discussions. I
> will test it more soon. Thanks.
> 
>  kernel/rcu/tree.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 18f07e167d5e..65d439286757 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3904,10 +3904,11 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
>  /*
>   * If needed, entrain an rcu_barrier() callback on rdp->cblist.
>   */
> -static void rcu_barrier_entrain(struct rcu_data *rdp)
> +static void rcu_barrier_entrain(struct rcu_data *rdp, unsigned long flags)
>  {
>  	unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
>  	unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
> +	bool was_alldone;
>  
>  	lockdep_assert_held(&rcu_state.barrier_lock);
>  	if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
> @@ -3916,14 +3917,20 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  	rdp->barrier_head.func = rcu_barrier_callback;
>  	debug_rcu_head_queue(&rdp->barrier_head);
>  	rcu_nocb_lock(rdp);
> +	was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +
>  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
>  		atomic_inc(&rcu_state.barrier_cpu_count);
> +		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */

I noticed that __call_rcu_nocb_wake() does a restore of the flags, which may not
work for all callers of rcu_nocb_entrain(). Sigh. The patch needs to be fixed
for that, but at least it highlights the potential issues.