On Sun, Oct 1, 2023 at 10:27 AM Neeraj upadhyay <neeraj.iitr10@xxxxxxxxx> wrote: > > On Sun, Oct 1, 2023 at 5:49 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > On Sat, Sep 30, 2023 at 6:01 AM Neeraj upadhyay <neeraj.iitr10@xxxxxxxxx> wrote: > > > > > > On Fri, Sep 29, 2023 at 3:35 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > Hello, > > > > Firstly, kudos to the detailed report and analysis. Rare failures are > > > > hard and your usage crash/kdump is awesome to dig deeper into the > > > > issue.. > > > > > > > > On Thu, Sep 28, 2023 at 3:59 AM zhuangel570 <zhuangel570@xxxxxxxxx> wrote: > > > > > > > > > > Hi, > > > > > > > > > > We encounter SRCU hung issue in stable tree 5.4.203, we are running VM create > > > > > and destroy concurrent test, the issue happens after several weeks. Now we > > > > > didn't have a way to reproduce this issue, the issue happens randomly, this > > > > > is the second time we found it in this year. > > > > > > > > > > > > > > > > > > > > CASE1: entry of CPU 136 belongs to GP 288 was accelerated to GP 292 > > > > > - [CPU 136] [GP 280-284] finished, yet not enter srcu_invoke_callbacks. > > > > > - [CPU 136] [GP 284-288] starting, new synchronize_srcu request, queue entry > > > > > to SDP. > > > > > - [CPU 041] [GP 284-288] starting, new synchronize_srcu request, workload run > > > > > faster than CPU 136, start GP, set rcu_seq_start. > > > > > - [CPU 136] [GP 284-288] starting, call srcu_funnel_gp_start, found no need > > > > > to start GP. > > > > > - [CPU 136] [GP 280-284] finished, start to run srcu_invoke_callbacks, > > > > > "accelerate" the seq of new added entry to 292 (it should be 288). > > > > > > > > But srcu_gp_seq is at 304 right now. How does it matter that the CB is > > > > marked for 292? It should be ready to execute anyway even at 292. Note > > > > the meaning of "acceleration", the idea is to start conservatively and > > > > move the callbacks forward as more accurate information is available. > > > > Considering this, 292 initially should be OK IMHO (that's just more > > > > conservative than 288).. > > > > > > > > > > Maybe it matters, as for a CPU, the callbacks will only be scheduled > > > in srcu_gp_end() for the GPs, for which it has updated ->srcu_data_have_cbs[idx] > > > and ->srcu_have_cbs[idx] > > > > Right but if I am looking at the code correctly, nothing guarantees > > that srcu_invoke_callbacks is called before srcu_gp_seq can advance. > > So all callbacks that were previously queued for older grace periods > > should be run whenever srcu_invoke_callbacks() eventually runs. That's > > why I was thinking that part looked normal to me (segments having > > older GP numbers). > > > > > > > > > > > > /* > > > > > * CASE2 > > > > > * - entry of CPU 136 belongs to GP 288 was accelerated to GP 296. > > > > > * - GP0: 280-284, GP1: 284-288, GP2: 288-292. > > > > > */ > > > > > > > > > > /* [GP0][CPU-136] */ > > > > > process_srcu { > > > > > srcu_gp_end > > > > > } > > > > > > > > > > /* [GP1][CPU-136] */ > > > > > synchronize_srcu { > > > > > __call_srcu { > > > > > rcu_segcblist_enqueue > > > > > /* [GP1][CPU-041] */ > > > > > synchronize_srcu { > > > > > __call_srcu { > > > > > srcu_funnel_gp_start > > > > > srcu_gp_start > > > > > } > > > > > } > > > > > process_srcu { > > > > > srcu_gp_end > > > > > rcu_seq_end > > > > > } > > > > > /* [GP1][CPU-136] */ > > > > > srcu_funnel_gp_start > > > > > } > > > > > } > > > > > /* [GP0][CPU-136] */ > > > > > srcu_invoke_callbacks { > > > > > > > > If srcu_invoke_callbacks() was really called for the rdp, I would have > > > > expected rcu_segcblist_advance() to advance all those pending > > > > callbacks to 304. > > > > > > If the callback is in NEXT_TAIL and not assigned GP num, > > > rcu_segcblist_advance() won't move it and next accelerate in > > > srcu_invoke_callbacks() will > > > assign it the next gp sequence num. > > > > Sure, and after that again it will call srcu_schedule_cbs_sdp() so > > that should be fine and the next workqueue invocation > > srcu_invoke_callbacks() can advance at that time. Right? > > > > But "more" only checks for CBs in DONE tail. The callbacks which have been just > accelerated are not advanced to DONE tail. > > Having said above, I am still trying to figure out, which callbacks > are actually being pointed > by NEXT tail. Given that __call_srcu() already does a advance and > accelerate, all enqueued > callbacks would be in either WAIT tail (the callbacks for current > active GP) or NEXT_READY > tail (the callbacks for next GP after current one completes). So, they > should already have > GP num assigned and srcu_invoke_callbacks() won't accelerate them. > Only case I can > think of is, if current GP completes after we sample > rcu_seq_current(&ssp->srcu_gp_seq) for > rcu_segcblist_advance() (so, WAIT tail cbs are not moved to DONE tail) > and a new GP is started > before we take snapshot ('s') of next GP for > rcu_segcblist_accelerate(), then the gp num 's' > > gp num of NEXT_READY_TAIL and will be put in NEXT tail. Not sure > if my understanding is correct here. Thoughts? > > __call_srcu() > rcu_segcblist_advance(&sdp->srcu_cblist, > rcu_seq_current(&ssp->srcu_gp_seq)); > s = rcu_seq_snap(&ssp->srcu_gp_seq); > (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s); Thanks for helping to confirm this! What I want to explain in my CASE (maybe is not totally correct, I "created" it which base the hung context), is rcu_segcblist_advance and rcu_segcblist_accelerate in srcu_invoke_callbacks maybe using not-matched GP seq. And this makes the callbacks "accelerate" to wrong GP, which has no CBS event on that SNP (CPU groups), then callback leaked in segcglist, cause __synchronize_srcu hang. > > Thanks > Neeraj > > > > if (more) { > > srcu_schedule_cbs_sdp(sdp, 0); > > } > > > > thanks, > > > > - Joel