Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section!

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Wed, 16 Dec 2020 16:21:12 +0100

On Wed, Dec 16 2020 at 15:55, Naresh Kamboju wrote:
> On Tue, 15 Dec 2020 at 23:52, Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>> > Or you could place checks for being in a BH-disable further up in
>> > the code.  Or build with CONFIG_DEBUG_INFO=y to allow more precise
>> > interpretation of this stack trace.
>
> I will try to reproduce this warning with DEBUG_INFO=y enabled kernel and
> get back to you with a better crash log.
>
>>
>> My money would be on the option that whatever run on this workqueue
>> before forgot to re-enable BH, but we already have a check for that...
>> Naresh, do you have the full log? Is there nothing like "BUG: workqueue
>> leaked lock" above the splat?

No, because it's in the middle of the work. The workqueue bug triggers
when the work has finished.

So cleanup_up() net does

   ....
   synchronize_rcu();   <- might sleep. So up to here it should be fine.

   list_for_each_entry_continue_reverse(ops, &pernet_list, list)
   	ops_exit_list(ops, &net_exit_list);

ops_exit_list() is called for each ops which then either invokes
ops->exit() or ops->exit_batch().

So one of those callbacks fails to reenable BH, so adding a check after
each invocation of ops->exit() and ops->exit_batch() for
!local_bh_disabled() should be able to identify the buggy callback.

Thanks,

        tglx