Hi folks, I just had a test machine livelock when running a concurrent rm -rf workload on an XFS filesystem with 64k directory block sizes. The buffer allocation code started reporting this 5 times a second: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250) Which is in GFP_NOFS|GFP_ZERO context. It is likely to have been a high order allocation (up to 64k), but there was still lenty of free memory available (2.8GB of 16GB): $ free total used free shared buffers cached Mem: 16424296 13593732 2830564 0 136 3184 -/+ buffers/cache: 13590412 2833884 Swap: 497976 497976 0 $ But clearly there was no page cache being used. All of the memory in use was in the inode/dentry caches: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 9486678 9483271 99% 1.19K 364874 26 11675968K xfs_inode 4820508 4820508 100% 0.21K 130284 37 1042272K dentry 4820224 4820224 100% 0.06K 75316 64 301264K kmalloc-64 The issue is that memory allocation was not making progress - the shrinkers we not doing anything because they were under GFP_NOFS allocation context, and kswapd was never woken to take over. The system was compeltely out of swap space, and all the CPU was being burnt in this function: 44.91% [kernel] [k] scan_swap_map The typical stack trace of a looping memory allocation is this: [211699.924006] CPU: 2 PID: 21939 Comm: rm Not tainted 3.13.0-dgc+ #172 [211699.924006] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [211699.924006] task: ffff88041a7dde40 ti: ffff8803bbeec000 task.ti: ffff8803bbeec000 [211699.924006] RIP: 0010:[<ffffffff81187dd8>] [<ffffffff81187dd8>] scan_swap_map+0x118/0x520 [211699.924006] RSP: 0018:ffff8803bbeed508 EFLAGS: 00000297 [211699.924006] RAX: 000000000000a1ba RBX: 0000000000000032 RCX: 0000000000000000 [211699.924006] RDX: 0000000000000001 RSI: 000000000001e64e RDI: 0000000000019def [211699.924006] RBP: ffff8803bbeed558 R08: 00000000002c6ba0 R09: 0000000000000000 [211699.924006] R10: 57ffb90ace3d4f80 R11: 0000000000019def R12: ffff88041a682900 [211699.924006] R13: 01ffffffffffffff R14: 0000000000000040 R15: ffff88041a6829a0 [211699.924006] FS: 00007fae682c8700(0000) GS:ffff88031bc00000(0000) knlGS:0000000000000000 [211699.924006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [211699.924006] CR2: 00000000004353b0 CR3: 00000002ea498000 CR4: 00000000000006e0 [211699.924006] Stack: [211699.924006] ffff8803bbeed538 ffff88031a9dc720 000000000000a1ba 0000000000004893 [211699.924006] 0000000000000000 ffff88041a682900 0000000000000000 0000000000000001 [211699.924006] 0000000000000000 ffff88041a6829a0 ffff8803bbeed598 ffffffff8118837f [211699.924006] Call Trace: [211699.924006] [<ffffffff8118837f>] get_swap_page+0xef/0x1e0 [211699.924006] [<ffffffff81184e34>] add_to_swap+0x24/0x70 [211699.924006] [<ffffffff8115f110>] shrink_page_list+0x300/0xa20 [211699.924006] [<ffffffff81169089>] ? __mod_zone_page_state+0x49/0x50 [211699.924006] [<ffffffff8116a3b9>] ? wait_iff_congested+0xa9/0x150 [211699.924006] [<ffffffff8115fe03>] shrink_inactive_list+0x243/0x480 [211699.924006] [<ffffffff811606f1>] shrink_lruvec+0x371/0x670 [211699.924006] [<ffffffff81cdb4ce>] ? _raw_spin_unlock+0xe/0x10 [211699.924006] [<ffffffff81160dea>] do_try_to_free_pages+0x11a/0x360 [211699.924006] [<ffffffff81161220>] try_to_free_pages+0x110/0x190 [211699.924006] [<ffffffff81156422>] __alloc_pages_nodemask+0x5a2/0x8a0 [211699.924006] [<ffffffff8118fac2>] alloc_pages_current+0xb2/0x170 [211699.924006] [<ffffffff81151bde>] __get_free_pages+0xe/0x50 [211699.924006] [<ffffffff8116d199>] kmalloc_order_trace+0x39/0xb0 [211699.924006] [<ffffffff810cf4c3>] ? finish_wait+0x63/0x80 [211699.924006] [<ffffffff81197156>] __kmalloc+0x176/0x180 [211699.924006] [<ffffffff810cf520>] ? __init_waitqueue_head+0x40/0x40 [211699.924006] [<ffffffff814a74f7>] kmem_alloc+0x77/0xf0 [211699.924006] [<ffffffff814feb54>] xfs_log_commit_cil+0x3c4/0x5a0 [211699.924006] [<ffffffff814a6be3>] xfs_trans_commit+0xc3/0x2d0 [211699.924006] [<ffffffff814e913e>] xfs_remove+0x3be/0x440 [211699.924006] [<ffffffff811b7d8d>] ? __d_lookup+0x11d/0x170 [211699.924006] [<ffffffff8149b842>] xfs_vn_unlink+0x52/0xa0 [211699.924006] [<ffffffff811acc22>] vfs_unlink+0xf2/0x160 [211699.924006] [<ffffffff811acef0>] do_unlinkat+0x260/0x2a0 [211699.924006] [<ffffffff811b003b>] SyS_unlinkat+0x1b/0x40 [211699.924006] [<ffffffff81ce3ea9>] system_call_fastpath+0x16/0x1b i.e. trying to do memory allocation during a transaction commit in XFS, and that is looping in kmem_alloc(). THe problem in this case is that kswapd was not being started to free slab cache memory (i.e. to handle the defered GFP_NOFS slab reclaim). It stayed in the livelock state for over an hour before I broke it by running "echo 2 > /proc/sys/vm/drop_caches" manually. That immediately freed up the slab cache as reclaim was not under GFP_NOFS constraints and the livelock went away and the system started to make progress again. I haven't seen this problem before, so this may be a regression... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>