On Sun 21-12-14 17:45:32, Tetsuo Handa wrote: [...] > Traces from uptime > 484 seconds of > http://I-love.SAKURA.ne.jp/tmp/serial-20141221.txt.xz is a stalled case. [ 548.449780] Out of memory: Kill process 12718 (a.out) score 890 or sacrifice child [...] [ 954.595576] a.out D ffff8800764918a0 0 12718 1 0x00100084 [ 954.597544] ffff880077d7fca8 0000000000000086 ffff880076491470 ffff880077d7ffd8 [ 954.599565] 0000000000013640 0000000000013640 ffff8800358c8210 ffff880076491470 [ 954.601634] 0000000000000000 ffff88007c8a3e48 ffff88007c8a3e4c ffff880076491470 [ 954.604091] Call Trace: [ 954.607766] [<ffffffff81618669>] schedule_preempt_disabled+0x29/0x70 [ 954.609792] [<ffffffff8161a555>] __mutex_lock_slowpath+0xb5/0x120 [ 954.611644] [<ffffffff8161a5e3>] mutex_lock+0x23/0x37 [ 954.613256] [<ffffffffa025fb47>] xfs_file_buffered_aio_write.isra.9+0x77/0x270 [xfs] [...] and it seems that it is blocked by another allocator: [ 957.178207] a.out R running task 0 12804 1 0x00000084 [ 957.180304] MemAlloc: 471962 jiffies on 0x10 [ 957.181738] ffff8800355df868 0000000000000086 ffff88007be98940 ffff8800355dffd8 [ 957.183831] 0000000000013640 0000000000013640 ffff88007c4174b0 ffff88007be98940 [ 957.185916] 0000000000000000 ffff8800355df940 0000000000000000 ffffffff81a621e8 [ 957.188067] Call Trace: [ 957.189130] [<ffffffff81618509>] _cond_resched+0x29/0x40 [ 957.190790] [<ffffffff8117752a>] shrink_slab+0x17a/0x1d0 [ 957.192384] [<ffffffff8117a330>] do_try_to_free_pages+0x280/0x450 [ 957.194117] [<ffffffff8117a5da>] try_to_free_pages+0xda/0x170 [ 957.195800] [<ffffffff8116db23>] __alloc_pages_nodemask+0x633/0xa50 [ 957.197615] [<ffffffff811b1ce7>] alloc_pages_current+0x97/0x110 [ 957.199314] [<ffffffff81164797>] __page_cache_alloc+0xa7/0xc0 [ 957.201026] [<ffffffff811652b0>] pagecache_get_page+0x70/0x1e0 [ 957.202724] [<ffffffff81165453>] grab_cache_page_write_begin+0x33/0x50 [ 957.204546] [<ffffffffa0252cb4>] xfs_vm_write_begin+0x34/0xe0 [xfs] but this task managed to make some progress because we can clearly see that pid 12718 (oom victim) managed to move on and get to OOM killer many times [ 961.062042] a.out(12718) the OOM killer was skipped for 1965000 times. [...] [ 983.140589] a.out(12718) the OOM killer was skipped for 2059000 times. This shouldn't happen for the xfs pagecache allocation because they all should be !__GFS_FS and we do not trigger OOM killer in that case and fail instead. But as already pointed out by Dave grab_cache_page_write_begin uses GFP_KERNEL for the radix tree allocation and that would trigger the OOM killer. The rest is our hopeless attempt to not fail the allocation. I believe that the patch from http://marc.info/?l=linux-mm&m=141987483503279 should help in this particular case. There are still other cases where we can livelock but this seems to be a clear bug in grab_cache_page_write_begin. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>