On Mon 03-04-23 20:14:28, Tetsuo Handa wrote: > On 2023/04/03 17:15, Michal Hocko wrote: > > Is this > > https://lore.kernel.org/all/0000000000001d74d205f7c1821f@xxxxxxxxxx/ the > > underlying report ? > > Yes. > > > Could you explain the the deadlock scenario? > > build_zonelists() from __build_all_zonelists() calls printk() with > zonelist_update_seq held. > > printk() holds console_owner lock for synchronous printing, and then upon > unlock of console_owner lock, printk() holds port_lock_key and port->lock. > > tty_insert_flip_string_and_push_buffer() from pty_write() conditionally calls > kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. But since commit 3d36424b3b58, > zonelist_update_seq is checked by GFP_ATOMIC allocation (i.e. a new locking dependency > was added by that commit). > > CPU0 CPU1 > pty_write() { > tty_insert_flip_string_and_push_buffer() { > __build_all_zonelists() { > spin_lock_irqsave(&port->lock, flags); > tty_insert_flip_string() { > tty_insert_flip_string_fixed_flag() { > __tty_buffer_request_room() { > tty_buffer_alloc() { > kmalloc(GFP_ATOMIC | __GFP_NOWARN) { > __alloc_pages_slowpath() { > write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd > // interrupt handler starts > handle_irq() { > serial8250_interrupt() { > serial8250_tx_chars() { > tty_port_tty_get() { > spin_lock_irqsave(&port->lock, flags); // spins here waiting for kmalloc() from tty_insert_flip_string() to complete > zonelist_iter_begin() { > read_seqbegin(&zonelist_update_seq) { > // spins here waiting for interrupt handler to complete if zonelist_update_seq.seqcount is odd > tty = tty_kref_get(port->tty); > spin_unlock_irqrestore(&port->lock, flags); > } > } > } > } > // interrupt handler ends > write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even > } > } > } > } > } > } > } > } > } > spin_unlock_irqrestore(&port->lock, flags); > } > } Thank you! IIUC this can only happen when there is a race with the memory hotplug. So pretty much a very rare event. Also I am not really sure this really requires any changes at the allocator level. I would much rather sacrifice the printk in build_zonelists or pull it out of the locked section. Or would printk_deferred help in this case? -- Michal Hocko SUSE Labs