On Wed, Aug 02, 2023 at 08:45:06AM -0700, Guenter Roeck wrote: > On 8/2/23 08:05, Paul E. McKenney wrote: > > On Wed, Aug 02, 2023 at 02:57:56PM +0100, Roy Hopkins wrote: > > > On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote: > > > > On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote: > > > > > > > > > > > > Please see below for my preferred fix. Does this work for you guys? > > > > > > > > Back to figuring out why recent kernels occasionally to blow up all > > > > rcutorture guest OSes... > > > > > > > > Thanx, Paul > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > > index 7294be62727b..2d5b8385c357 100644 > > > > --- a/kernel/rcu/tasks.h > > > > +++ b/kernel/rcu/tasks.h > > > > @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot) > > > > if (unlikely(midboot)) { > > > > needgpcb = 0x2; > > > > } else { > > > > + mutex_unlock(&rtp->tasks_gp_mutex); > > > > set_tasks_gp_state(rtp, RTGS_WAIT_CBS); > > > > rcuwait_wait_event(&rtp->cbs_wait, > > > > (needgpcb = rcu_tasks_need_gpcb(rtp)), > > > > TASK_IDLE); > > > > + mutex_lock(&rtp->tasks_gp_mutex); > > > > } > > > > if (needgpcb & 0x2) { > > > > > > Your preferred fix looks good to me. > > > > > > With the original code I can quite easily reproduce the problem on my > > > system every 10 reboots or so. With your fix in place the problem no > > > longer occurs. > > > > Very good, thank you! May I add your Tested-by? > > > > FWIW, I am still working on it. So far I get > > [ 8.191589] KTAP version 1 > [ 8.191769] # Subtest: kunit_executor_test > [ 8.191972] # module: kunit > [ 8.192012] 1..8 > [ 8.197643] ok 1 parse_filter_test > [ 8.201851] ok 2 filter_suites_test > [ 8.206713] ok 3 filter_suites_test_glob_test > [ 8.211806] ok 4 filter_suites_to_empty_test > [ 8.214077] kunit executor: filter operation not found: speed>slow, module!=example > [ 8.217933] # parse_filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:126 > [ 8.217933] Expected err == 0, but > [ 8.217933] err == -22 (0xffffffffffffffea) > [ 8.217933] > [ 8.217933] failed to parse filter '(efault)' > [ 8.221266] not ok 5 parse_filter_attr_test > [ 8.224224] kunit executor: filter operation not found: speed>slow > [ 8.225837] # filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:165 > [ 8.225837] Expected err == 0, but > [ 8.225837] err == -22 (0xffffffffffffffea) > [ 8.228850] not ok 6 filter_attr_test > [ 8.230942] kunit executor: filter operation not found: module!=dummy > [ 8.232167] # filter_attr_empty_test: ASSERTION FAILED at lib/kunit/executor_test.c:190 > [ 8.232167] Expected err == 0, but > [ 8.232167] err == -22 (0xffffffffffffffea) > [ 8.235317] not ok 7 filter_attr_empty_test > [ 8.237065] kunit executor: filter operation not found: speed>slow > [ 8.238796] # filter_attr_skip_test: ASSERTION FAILED at lib/kunit/executor_test.c:209 > [ 8.238796] Expected err == 0, but > [ 8.238796] err == -22 (0xffffffffffffffea) > [ 8.241897] not ok 8 filter_attr_skip_test > [ 8.241947] # kunit_executor_test: pass:4 fail:4 skip:0 total:8 > [ 8.242144] # Totals: pass:4 fail:4 skip:0 total:8 > > and it looks like the console no longer works. Most likely this is some other problem > that was introduced while tests were broken. It will take me some time to track that down. No rush. Given that this bug is a year old, that it happens only when debug options are enabled, and that it has only been seen in current -next, my plan is to submit it into the next merge window. So this one stays mutable for about another 10 days. On the strength of Roy's Tested-by, however, I will push this patch into -next soon, so that should make things a bit easier. Or so I hope. And again, thank you all for tracking this down! Thanx, Paul