On Fri 23-06-23 19:12:32, Sebastian Andrzej Siewior wrote: > __build_all_zonelists() acquires zonelist_update_seq by first disabling > interrupts via local_irq_save() and then acquiring the seqlock with > write_seqlock(). This is troublesome and leads to problems on > PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping > lock on PREEMPT_RT and must not be acquired with disabled interrupts. > > The API provides write_seqlock_irqsave() which does the right thing in > one step. > printk_deferred_enter() has to be invoked in non-migrate-able context to > ensure that deferred printing is enabled and disabled on the same CPU. > This is the case after zonelist_update_seq has been acquired. > > There was discussion on the first submission that the order should be: > local_irq_disable(); > printk_deferred_enter(); > write_seqlock(); > > to avoid pitfalls like having an unaccounted printk() coming from > write_seqlock_irqsave() before printk_deferred_enter() is invoked. The > only origin of such a printk() can be a lockdep splat because the > lockdep annotation happens after the sequence count is incremented. > This is exceptional and subject to change. > > It was also pointed that PREEMPT_RT can be affected by the printk > problem since its write_seqlock_irqsave() does not really disable > interrupts. This isn't the case because PREEMPT_RT's printk > implementation differs from the mainline implementation in two important > aspects: > - Printing happens in a dedicated threads and not at during the > invocation of printk(). > - In emergency cases where synchronous printing is used, a different > driver is used which does not use tty_port::lock. > > Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer > printk output. > > Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> Thanks for extending the changelog. This is much more clearer IMO. One nit below which I haven't noticed before. Anyway Acked-by: Michal Hocko <mhocko@xxxxxxxx> > --- > mm/page_alloc.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47421bedc12b7..99b7e7d09c5c0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5808,11 +5808,10 @@ static void __build_all_zonelists(void *data) > unsigned long flags; > > /* > - * Explicitly disable this CPU's interrupts before taking seqlock > - * to prevent any IRQ handler from calling into the page allocator > - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. > + * The zonelist_update_seq must be acquired with irqsave because the > + * reader can be invoked from IRQ with GFP_ATOMIC. > */ > - local_irq_save(flags); > + write_seqlock_irqsave(&zonelist_update_seq, flags); > /* > * Explicitly disable this CPU's synchronous printk() before taking > * seqlock to prevent any printk() from trying to hold port->lock, for This is not the case anymore because the locking ordering has flipped. I would just extend the comment above by something like: * Also disable synchronous printk() to prevent any printk() from trying * to hold port->lock, for tty_insert_flip_string_and_push_buffer() on * other CPU might be calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with * port->lock held. > @@ -5820,7 +5819,6 @@ static void __build_all_zonelists(void *data) > * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. > */ > printk_deferred_enter(); > - write_seqlock(&zonelist_update_seq); > > #ifdef CONFIG_NUMA > memset(node_load, 0, sizeof(node_load)); > @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) > #endif > } > > - write_sequnlock(&zonelist_update_seq); > printk_deferred_exit(); > - local_irq_restore(flags); > + write_sequnlock_irqrestore(&zonelist_update_seq, flags); > } > > static noinline void __init > -- > 2.40.1 -- Michal Hocko SUSE Labs