On Tue 04-04-23 09:37:25, Tetsuo Handa wrote: > syzbot is reporting circular locking dependency which involves > zonelist_update_seq seqlock [1], for this lock is checked by memory > allocation requests which do not need to be retried. > > We somehow need to prevent __alloc_pages_slowpath() from checking > this lock. Since Petr Mladek thinks that __build_all_zonelists() can > become a candidate for deferring printk() [2], let's make sure that > current CPU/thread won't reach __alloc_pages_slowpath() while this lock > is in use. I agree with Petr that the lockdep splat and the lockup explanation should be part of the changelog. Just reuse what you had in previous email. > Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@xxxxxxxxxxxxxxxxxxxxxxxxx> > Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] > Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") > Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Petr Mladek <pmladek@xxxxxxxx> > --- > mm/page_alloc.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7136c36c5d01..64fa77b8d24a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data) > int nid; > int __maybe_unused cpu; > pg_data_t *self = data; > + unsigned long flags; > > + /* > + * Since __alloc_pages_slowpath() spins if zonelist_update_seq.seqcount > + * is odd, any memory allocation while zonelist_update_seq.seqcount is > + * odd have to be avoided. > + * > + * Explicitly disable local irqs in order to avoid calling > + * kmalloc(GFP_ATOMIC) from e.g. timer interrupt handler. > + * Also, explicitly prevent printk() from synchronously waiting for > + * port->lock because tty_insert_flip_string_and_push_buffer() might > + * call kmalloc(GFP_ATOMIC | __GFP_NOWARN) while holding port->lock. This explanation of local_irq_save just doesn't make any sense. You do not prevent any other cpu from entering the IRQ and doing the same thing. If the sole purpose of local_irq_save is to conform printk_deferred_enter then state that instead. Although it seems that Petr believes that preempt_disable should be sufficient and then it would be preferred as well. This would require update to the comment for printk_deferred_enter though. Thanks! > + */ > + local_irq_save(flags); > + printk_deferred_enter(); > write_seqlock(&zonelist_update_seq); > > #ifdef CONFIG_NUMA > @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data) > } > > write_sequnlock(&zonelist_update_seq); > + printk_deferred_exit(); > + local_irq_restore(flags); > } > > static noinline void __init > -- > 2.34.1 -- Michal Hocko SUSE Labs