On Wed, 2019-10-09 at 16:34 +0200, Michal Hocko wrote: > On Wed 09-10-19 10:19:44, Qian Cai wrote: > > On Wed, 2019-10-09 at 15:51 +0200, Michal Hocko wrote: > > [...] > > > Can you paste the full lock chain graph to be sure we are on the same > > > page? > > > > WARNING: possible circular locking dependency detected > > 5.3.0-next-20190917 #8 Not tainted > > ------------------------------------------------------ > > test.sh/8653 is trying to acquire lock: > > ffffffff865a4460 (console_owner){-.-.}, at: > > console_unlock+0x207/0x750 > > > > but task is already holding lock: > > ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at: > > __offline_isolated_pages+0x179/0x3e0 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #3 (&(&zone->lock)->rlock){-.-.}: > > __lock_acquire+0x5b3/0xb40 > > lock_acquire+0x126/0x280 > > _raw_spin_lock+0x2f/0x40 > > rmqueue_bulk.constprop.21+0xb6/0x1160 > > get_page_from_freelist+0x898/0x22c0 > > __alloc_pages_nodemask+0x2f3/0x1cd0 > > alloc_pages_current+0x9c/0x110 > > allocate_slab+0x4c6/0x19c0 > > new_slab+0x46/0x70 > > ___slab_alloc+0x58b/0x960 > > __slab_alloc+0x43/0x70 > > __kmalloc+0x3ad/0x4b0 > > __tty_buffer_request_room+0x100/0x250 > > tty_insert_flip_string_fixed_flag+0x67/0x110 > > pty_write+0xa2/0xf0 > > n_tty_write+0x36b/0x7b0 > > tty_write+0x284/0x4c0 > > __vfs_write+0x50/0xa0 > > vfs_write+0x105/0x290 > > redirected_tty_write+0x6a/0xc0 > > do_iter_write+0x248/0x2a0 > > vfs_writev+0x106/0x1e0 > > do_writev+0xd4/0x180 > > __x64_sys_writev+0x45/0x50 > > do_syscall_64+0xcc/0x76c > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This one looks indeed legit. pty_write is allocating memory from inside > the port->lock. But this seems to be quite broken, right? The forward > progress depends on GFP_ATOMIC allocation which might fail easily under > memory pressure. So the preferred way to fix this should be to change > the allocation scheme to use the preallocated buffer and size it from a > context when it doesn't hold internal locks. It might be a more complex > fix than using printk_deferred or other games but addressing that would > make the pty code more robust as well. I am not really sure if doing a surgery in pty code is better than fixing the memory offline side as a short-term fix.