Re: SRCU: kworker hung in synchronize_srcu

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Sat, 30 Sep 2023 20:19:25 -0400

On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay <neeraj.iitr10@xxxxxxxxx> wrote:
>
> On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> > Firstly, kudos to the detailed report and analysis. Rare failures are
> > hard and your usage crash/kdump is awesome to dig deeper into the
> > issue..
> >
> > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 <zhuangel570@xxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create
> > > and destroy concurrent test, the issue happens after several weeks. Now we
> > > didn't have a way to reproduce this issue, the issue happens randomly, this
> > > is the second time we found it in this year.
> > >
> > >
> > >
> > > CASE1: entry of CPU 136 belongs to GP 288 was accelerated to GP 292
> > > - [CPU 136] [GP 280-284] finished, yet not enter srcu_invoke_callbacks.
> > > - [CPU 136] [GP 284-288] starting, new synchronize_srcu request, queue entry
> > >   to SDP.
> > > - [CPU 041] [GP 284-288] starting, new synchronize_srcu request, workload run
> > >   faster than CPU 136, start GP, set rcu_seq_start.
> > > - [CPU 136] [GP 284-288] starting, call srcu_funnel_gp_start, found no need
> > >   to start GP.
> > > - [CPU 136] [GP 280-284] finished, start to run srcu_invoke_callbacks,
> > >   "accelerate" the seq of new added entry to 292 (it should be 288).
> >
> > But srcu_gp_seq is at 304 right now. How does it matter that the CB is
> > marked for 292? It should be ready to execute anyway even at 292. Note
> > the meaning of "acceleration", the idea is to start conservatively and
> > move the callbacks forward as more accurate information is available.
> > Considering this, 292 initially should be OK IMHO (that's just more
> > conservative than 288)..
> >
>
> Maybe it matters, as for a CPU, the callbacks will only be scheduled
> in srcu_gp_end() for the GPs, for which it has updated ->srcu_data_have_cbs[idx]
> and ->srcu_have_cbs[idx]

Right but if I am looking at the code correctly, nothing guarantees
that srcu_invoke_callbacks is called before srcu_gp_seq can advance.
So all callbacks that were previously queued for older grace periods
should be run whenever srcu_invoke_callbacks() eventually runs. That's
why I was thinking that part looked normal to me (segments having
older GP numbers).

> > >
> > > /*
> > >  * CASE2
> > >  * - entry of CPU 136 belongs to GP 288 was accelerated to GP 296.
> > >  * - GP0: 280-284, GP1: 284-288, GP2: 288-292.
> > >  */
> > >
> > > /* [GP0][CPU-136] */
> > > process_srcu {
> > >  srcu_gp_end
> > > }
> > >
> > >                   /* [GP1][CPU-136] */
> > >                   synchronize_srcu {
> > >                    __call_srcu {
> > >                     rcu_segcblist_enqueue
> > >                                            /* [GP1][CPU-041] */
> > >                                            synchronize_srcu {
> > >                                             __call_srcu {
> > >                                              srcu_funnel_gp_start
> > >                                               srcu_gp_start
> > >                                             }
> > >                                            }
> > >                                            process_srcu {
> > >                                             srcu_gp_end
> > >                                              rcu_seq_end
> > >                                            }
> > >                   /* [GP1][CPU-136] */
> > >                     srcu_funnel_gp_start
> > >                    }
> > >                   }
> > > /* [GP0][CPU-136] */
> > > srcu_invoke_callbacks {
> >
> > If srcu_invoke_callbacks() was really called for the rdp, I would have
> > expected rcu_segcblist_advance() to advance all those pending
> > callbacks to 304.
>
> If the callback is in NEXT_TAIL and not assigned GP num,
> rcu_segcblist_advance() won't move it and next accelerate in
> srcu_invoke_callbacks() will
> assign it the next gp sequence num.

Sure, and after that again it will call srcu_schedule_cbs_sdp() so
that should be fine and the next workqueue invocation
srcu_invoke_callbacks() can advance at that time. Right?

if (more) {
  srcu_schedule_cbs_sdp(sdp, 0);
}

thanks,

 - Joel