On Tue, Aug 27, 2024 at 10:33:13AM -0700, Paul E. McKenney wrote: > On Tue, Aug 27, 2024 at 05:41:52PM +0200, Valentin Schneider wrote: > > On 27/08/24 12:03, Valentin Schneider wrote: > > > On 26/08/24 09:31, Paul E. McKenney wrote: > > >> On Mon, Aug 26, 2024 at 01:44:35PM +0200, Valentin Schneider wrote: > > >>> > > >>> Woops... > > >> > > >> On the other hand, removing that dequeue_task() makes next-20240823 > > >> pass light testing. > > >> > > >> I have to ask... > > >> > > >> Does it make sense for Valentin to rearrange those commits to fix > > >> the two build bugs and remove that dequeue_task(), all in the name of > > >> bisectability. Or is there something subtle here so that only Peter > > >> can do this work, shoulder and all? > > >> > > > > > > I suppose at the very least another pair of eyes on this can't hurt, let me > > > get untangled from some other things first and I'll take a jab at it. > > > > I've taken tip/sched/core and shuffled hunks around; I didn't re-order any > > commit. I've also taken out the dequeue from switched_from_fair() and put > > it at the very top of the branch which should hopefully help bisection. > > > > The final delta between that branch and tip/sched/core is empty, so it > > really is just shuffling inbetween commits. > > > > Please find the branch at: > > > > https://gitlab.com/vschneid/linux.git -b mainline/sched/eevdf-complete-builderr > > > > I'll go stare at the BUG itself now. > > Thank you! > > I have fired up tests on the "BROKEN?" commit. If that fails, I will > try its predecessor, and if that fails, I wlll bisect from e28b5f8bda01 > ("sched/fair: Assert {set_next,put_prev}_entity() are properly balanced"), > which has stood up to heavy hammering in earlier testing. And of 50 runs of TREE03 on the "BROKEN?" commit resulted in 32 failures. Of these, 29 were the dequeue_rt_stack() failure. Two more were RCU CPU stall warnings, and the last one was an oddball "kernel BUG at kernel/sched/rt.c:1714" followed by an equally oddball "Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI". Just to be specific, this is commit: df8fe34bfa36 ("BROKEN? sched/fair: Dequeue sched_delayed tasks when switching from fair") This commit's predecessor is this commit: 2f888533d073 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy") This predecessor commit passes 50 runs of TREE03 with no failures. So that addition of that dequeue_task() call to the switched_from_fair() function is looking quite suspicious to me. ;-) Thanx, Paul