On Sun, Apr 04, 2021 at 09:48:08AM -0700, Paul E. McKenney wrote: > On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote: > > On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote: > > > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+dde0cc33951735441301@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > WARNING: suspicious RCU usage > > > 5.12.0-rc5-syzkaller #0 Not tainted > > > ----------------------------- > > > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0 > > > no locks held by systemd-udevd/4825. > > > > I think we have something that's taking the RCU read lock in > > (soft?) interrupt context and not releasing it properly in all > > situations. This thread doesn't have any locks recorded, but > > lock_is_held(&rcu_bh_lock_map) is true. > > > > Is there some debugging code that could find this? eg should > > lockdep_softirq_end() check that rcu_bh_lock_map is not held? > > (if it's taken in process context, then BHs can't run, so if it's > > held at softirq exit, then there's definitely a problem). > > Something like the (untested) patch below? Maybe? Will this tell us who took the lock? I was really trying to throw out a suggestion in the hope that somebody who knows this area better than I do would tell me I was wrong. > Please note that it does not make sense to also check for > either rcu_lock_map or rcu_sched_lock_map because either of > these might be held by the interrupted code. Yes! Although if we do it somewhere like tasklet_action_common(), we could do something like: +++ b/kernel/softirq.c @@ -774,6 +774,7 @@ static void tasklet_action_common(struct softirq_action *a, while (list) { struct tasklet_struct *t = list; + unsigned long rcu_lockdep = rcu_get_lockdep_state(); list = list->next; @@ -790,6 +791,10 @@ static void tasklet_action_common(struct softirq_action *a, } tasklet_unlock(t); } + if (rcu_lockdep != rcu_get_lockdep_state()) { + printk(something useful about t); + RCU_LOCKDEP_WARN(... something else useful ...); + } local_irq_disable(); where rcu_get_lockdep_state() returns a bitmap of whether the four rcu lockdep maps are held. We might also need something similar in __do_softirq(), in case it's not a tasklet that's the problem.