On Mon, Jul 19, 2021 at 10:24 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote: > > On Mon, Jul 19, 2021 at 9:53 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Sun, Jul 18, 2021 at 11:51:36PM +0100, Matthew Wilcox wrote: > > > On Sun, Jul 18, 2021 at 02:59:14PM -0700, Paul E. McKenney wrote: > > > > > > https://lore.kernel.org/lkml/CAK2bqVK0Q9YcpakE7_Rc6nr-E4e2GnMOgi5jJj=_Eh_1k > > > > > > EHLHA@xxxxxxxxxxxxxx/ > > > > > > > > But this one does show this warning in v5.12.17: > > > > > > > > WARN_ON_ONCE(!preempt && rcu_preempt_depth() > 0); > > > > > > > > This is in rcu_note_context_switch(), and could be caused by something > > > > like a schedule() within an RCU read-side critical section. This would > > > > of course be RCU-usage bugs, given that you are not permitted to block > > > > within an RCU read-side critical section. > > > > > > > > I suggest checking the functions in the stack trace to see where the > > > > rcu_read_lock() is hiding. CONFIG_PROVE_LOCKING might also be helpful. > > > > > > I'm not sure I see it in this stack trace. > > > > > > Is it possible that there's something taking the rcu read lock in an > > > interrupt handler, then returning from the interrupt handler without > > > releasing the rcu lock? Do we have debugging that would fire if > > > somebody did this? > > > > Lockdep should complain, but in the absence of lockdep I don't know > > that anything would gripe in this situation. > I think Lockdep should complain. > Meanwhile, I examined the 5.12.17 by naked eye, and found a suspicious place I examined 5.13.2 the unpaired rcu_read_lock is still there > that could possibly trigger that problem: > > struct swap_info_struct *get_swap_device(swp_entry_t entry) > { > struct swap_info_struct *si; > unsigned long offset; > > if (!entry.val) > goto out; > si = swp_swap_info(entry); > if (!si) > goto bad_nofile; > > rcu_read_lock(); > if (data_race(!(si->flags & SWP_VALID))) > goto unlock_out; > offset = swp_offset(entry); > if (offset >= si->max) > goto unlock_out; > > return si; > bad_nofile: > pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); > out: > return NULL; > unlock_out: > rcu_read_unlock(); > return NULL; > } > I guess the function "return si" without a rcu_read_unlock. > > However the get_swap_device has changed in the mainline tree, > there is no rcu_read_lock anymore. > > > > > Also, this is a preemptible kernel, so it is possible to trace > > __rcu_read_lock(), if that helps. > > > > Thanx, Paul > Thanx > Zhouyi