Re: [BUG] Random intermittent boost failures (Was Re: [BUG] TREE04..)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 15, 2023 at 12:57 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
[...]
> > > > > On the other hand, I came up with a real fix [1] and I am currently testing it.
> > > > > This is to fix a live lock between RT push and CPU hotplug's
> > > > > select_fallback_rq()-induced push. I am not sure if the fix works but I have
> > > > > some faith based on what I'm seeing in traces. Fingers crossed. I also feel
> > > > > the real fix is needed to prevent these issues even if we're able to hide it
> > > > > by halving the total rcutorture boost threads.
> > > >
> > > > So that fixed it without any changes to RCU. Below is the updated patch also
> > > > for the archives. Though I'm rewriting it slightly differently and testing
> > > > that more. The main thing I am doing in the new patch is I find that RT
> > > > should not select !cpu_active() CPUs since those have the scheduler turned
> > > > off. Though checking for cpu_dying() also works. I could not find any
> > > > instance where cpu_dying() != cpu_active() but there could be a tiny window
> > > > where that is true. Anyway, I'll make some noise with scheduler folks once I
> > > > have the new version of the patch tested.
> > > >
> > > > Also halving the number of RT boost threads makes it less likely to occur but
> > > > does not work. Not too surprising since the issue actually may not be related
> > > > to too many RT threads but rather a lockup between hotplug and RT..
> > >
> > > Again, looks promising!  When I get the non-RCU -rcu stuff moved to
> > > v6.6-rc1 and appropriately branched and tested, I will give it a go on
> > > the test setup here.
> >
> > Thanks a lot, and I have enclosed a simpler updated patch below which also
> > similarly shows very good results. This is the one I would like to test
> > more and send to scheduler folks. I'll send it out once I have it tested more
> > and also possibly after seeing your results (I am on vacation next week so
> > there's time).
>
> Much nicer!  This is just on current mainline, correct?

Yes, correct. I also applied it cleanly to all stable kernels for my
test rigs. Only 5.10 had a little merge conflict but it was trivially
fixed.

thanks,

 - Joel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux