Re: [BUG] Random intermittent boost failures (Was Re: [BUG] TREE04..)

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 17 Sep 2023 23:05:33 -0700

On Fri, Sep 15, 2023 at 04:37:11PM +0000, Joel Fernandes wrote:
> Hi Paul,
> Thanks! I merged replies to 3 threads into this one to make it easier to follow.
> 
> On Fri, Sep 15, 2023 at 07:53:15AM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 15, 2023 at 11:33:13AM +0000, Joel Fernandes wrote:
> > > On Fri, Sep 15, 2023 at 12:13:31AM +0000, Joel Fernandes wrote:
> [...]
> > > On the other hand, I came up with a real fix [1] and I am currently testing it.
> > > This is to fix a live lock between RT push and CPU hotplug's
> > > select_fallback_rq()-induced push. I am not sure if the fix works but I have
> > > some faith based on what I'm seeing in traces. Fingers crossed. I also feel
> > > the real fix is needed to prevent these issues even if we're able to hide it
> > > by halving the total rcutorture boost threads.
> > 
> > This don't-schedule-on-dying CPUs approach does quite look promising
> > to me!
> > 
> > Then again, I cannot claim to be a scheduler expert.  And I am a bit
> > surprised that this does not already happen.  Which makes me wonder
> > (admittedly without evidence either way) whether there is some CPU-hotplug
> > race that it might induce.  But then again, figuring this sort of thing
> > out is what part of the scheduler guys are there for, right?  ;-)
> 
> Yes it looks promising. Actually this sort of thing also seems to be already
> done in CFS, it just not there in RT. So maybe it is OK. Testing so far
> showed me pretty good results even with hotplug testing. Here is hoping!
> 
> > > > On the other hand, I came up with a real fix [1] and I am currently testing it.
> > > > This is to fix a live lock between RT push and CPU hotplug's
> > > > select_fallback_rq()-induced push. I am not sure if the fix works but I have
> > > > some faith based on what I'm seeing in traces. Fingers crossed. I also feel
> > > > the real fix is needed to prevent these issues even if we're able to hide it
> > > > by halving the total rcutorture boost threads.
> > > 
> > > So that fixed it without any changes to RCU. Below is the updated patch also
> > > for the archives. Though I'm rewriting it slightly differently and testing
> > > that more. The main thing I am doing in the new patch is I find that RT
> > > should not select !cpu_active() CPUs since those have the scheduler turned
> > > off. Though checking for cpu_dying() also works. I could not find any
> > > instance where cpu_dying() != cpu_active() but there could be a tiny window
> > > where that is true. Anyway, I'll make some noise with scheduler folks once I
> > > have the new version of the patch tested.
> > > 
> > > Also halving the number of RT boost threads makes it less likely to occur but
> > > does not work. Not too surprising since the issue actually may not be related
> > > to too many RT threads but rather a lockup between hotplug and RT..
> > 
> > Again, looks promising!  When I get the non-RCU -rcu stuff moved to
> > v6.6-rc1 and appropriately branched and tested, I will give it a go on
> > the test setup here.
> 
> Thanks a lot, and I have enclosed a simpler updated patch below which also
> similarly shows very good results. This is the one I would like to test
> more and send to scheduler folks. I'll send it out once I have it tested more
> and also possibly after seeing your results (I am on vacation next week so
> there's time).
> 
> > > We could run them on just the odd, or even ones and still be able to get
> > > sufficient boost testing. This may be especially important without RT
> > > throttling. I'll go ahead and queue a test like that.
> > > 
> > > Thoughts?
> > 
> > The problem with this is that it will often render RCU priority boosting
> > unnecessary.  Any kthread preempted within an RCU read-side critical
> > section will with high probability quickly be resumed on one of the
> > even-numbered CPUs.
> > 
> > Or were you also planning to bind the rcu_torture_reader() kthreads to
> > a specific CPU, preventing such migration?  Or am I missing something
> > here?
> 
> You are right, I see now why you were running them on all CPUs. One note
> though, af the RCU reader threads are CFS, they will not be immediately
> pushed out so maybe it may work (unlike if the RCU reader threads being
> preempted were RT). However testing shows thread-halving does not fix the
> issue anyway so we can scratch that. Instead, below is the updated patch for
> don't schedule-on-dying/inactive CPUs approach which is showing really good
> results!
> 
> And I'll most likely see you the week after, after entering into a 4-day
> quiescent state. ;-)

And I queued this as an experimental patch and am starting testing.
I am assuming that the path to mainline is through the scheduler tree.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> ---8<-----------------------
> 
> From: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Subject: [PATCH] RT: Alternative fix for livelock with hotplug
> 
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
>  kernel/sched/cpupri.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index a286e726eb4b..42c40cfdf836 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
>  
>  	if (lowest_mask) {
>  		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
> +		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
>  
>  		/*
>  		 * We have to ensure that we have at least one bit
> -- 
> 2.42.0.459.ge4e396fd5e-goog
>