On Mon, Dec 16, 2024 at 11:36:25AM -0800, Paul E. McKenney wrote: > On Mon, Dec 16, 2024 at 03:38:20PM +0100, Tomas Glozar wrote: > > ne 15. 12. 2024 v 19:41 odesílatel Paul E. McKenney <paulmck@xxxxxxxxxx> napsal: > > > > > > And the fix for the TREE03 too-short grace periods is finally in, at > > > least in prototype form: > > > > > > https://lore.kernel.org/all/da5065c4-79ba-431f-9d7e-1ca314394443@paulmck-laptop/ > > > > > > Or this commit on -rcu: > > > > > > 22bee20913a1 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > > > > > This passes more than 30 hours of 400 concurrent instances of rcutorture's > > > TREE03 scenario, with modifications that brought the bug reproduction > > > rate up to 50 per hour. I therefore have strong reason to believe that > > > this fix is a real fix. > > > > > > With this fix in place, a 20-hour run of 400 concurrent instances > > > of rcutorture's TREE03 scenario resulted in 50 instances of the > > > enqueue_dl_entity() splat pair. One (untrimmed) instance of this pair > > > of splats is shown below. > > > > > > You guys did reproduce this some time back, so unless you tell me > > > otherwise, I will assume that you have this in hand. I would of course > > > be quite happy to help, especially with adding carefully chosen debug > > > (heisenbug and all that) or testing of alleged fixes. > > > > > > > The same splat was recently reported to LKML [1] and a patchset was > > sent and merged into tip/sched/urgent that fixes a few bugs around > > double-enqueue of the deadline server [2]. I'm currently re-running > > TREE03 with those patches, hoping they will also fix this issue. > > Thank you very much! > > An initial four-hour test of 400 instances of an enhanced TREE03 ran > error-free. I would have expected about 10 errors, so this gives me > 99.9+% confidence that the patches improved things at least a little > bit and 99% confidence that these patches reduced the error rate by at > least a factor of two. > > I am starting an overnight run. If that completes without error, this > will provide 99% confidence that these patches reduced the error rate > by at least an order of magnitude. And we have that level of confidence! Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>