Re: One-off rcu_nocb_rdp_deoffload bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 04, 2024 at 03:48:02PM +0200, Frederic Weisbecker wrote:
> Le Wed, Sep 04, 2024 at 05:59:46AM -0700, Paul E. McKenney a écrit :
> > On Tue, Aug 20, 2024 at 10:46:43AM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 20, 2024 at 04:43:39PM +0200, Frederic Weisbecker wrote:
> > > > Sorry for the html mail, I only have my phone ...
> > > > 
> > > > Le mar. 20 août 2024, 13:07, Z qiang <qiang.zhang1211@xxxxxxxxx> a écrit :
> > > > 
> > > > > >
> > > > > > Hello, Frederic,
> > > > > >
> > > > > > I have seen this once on Neeraj's tree with a few commits on top (-rcu
> > > > > > commit 46774278c74f ("rcutorture: Test start-poll primitives with
> > > > > > interrupts disabled").  But only the once so far.
> > > > > >
> > > > > > This is the WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)) in
> > > > > > rcu_nocb_rdp_deoffload().
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > >
> > > > > The rcu_segcblist_extract_done_cbs() doesn't reduce the count of
> > > > > rsclp->len,
> > > > > in rcu_do_bacth().  this may cause that after we execute the barrier rcu
> > > > > callback, before decrementing the count of rsclp->len,  the rcu_barrier()
> > > > > returns and makes a judgment of rcu_segcblist_n_cbs(&rdp->cblist)  in
> > > > > rcu_nocb_rdp_deoffload().
> > > > >
> > > > 
> > > > That sounds plausible! You just unlocked my thoughts running in circle
> > > > since yesterday.
> > > > 
> > > > 
> > > > > maybe can use WARN_ON_ONCE(rcu_segcblist_n_segment_cbs()) instead
> > > > > of WARN_ON_ONCE(rcu_segcblist_n_cbs())
> > > > >
> > > > > Thoughts?
> > > > 
> > > > I'll test that once I'm back from vacation the september 2nd. Thanks!
> > > 
> > > Thank you both!!!
> > > 
> > > Running -next over last night hit a number of boot-time splats, so I
> > > have no idea if this reproduces nicely.  Can't have everything!  ;-)
> > 
> > And it is now a two-off given another one last night's testing.  This was
> > from 168 hours of TREE01 on my -rcu "dev" branch (as opposed to -next),
> > but I have run many runs over the past two weeks.  So it is reproducible,
> > but rare.
> > 
> > Ah, and if it matters, I synched up to Neeraj's latest as of about 18
> > hours ago just before starting this test.
> 
> Yes, I'm preparing an update for the offending patch (which has one more
> embarassing issue while I'm going through it again).

Very good, thank you!

							Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux