On Tue, Feb 05 2013, Michal Hocko wrote: > On Tue 05-02-13 15:49:47, azurIt wrote: > [...] >> Just to be sure - am i supposed to apply this two patches? >> http://watchdog.sk/lkml/patches/ > > 5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I > mentioned in a follow up email. Here is the full patch: > --- > From f2bf8437d5b9bb38a95a432bf39f32c584955171 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@xxxxxxx> > Date: Mon, 26 Nov 2012 11:47:57 +0100 > Subject: [PATCH] memcg: do not trigger OOM from add_to_page_cache_locked > > memcg oom killer might deadlock if the process which falls down to > mem_cgroup_handle_oom holds a lock which prevents other task to > terminate because it is blocked on the very same lock. > This can happen when a write system call needs to allocate a page but > the allocation hits the memcg hard limit and there is nothing to reclaim > (e.g. there is no swap or swap limit is hit as well and all cache pages > have been reclaimed already) and the process selected by memcg OOM > killer is blocked on i_mutex on the same inode (e.g. truncate it). > > Process A > [<ffffffff811109b8>] do_truncate+0x58/0xa0 # takes i_mutex > [<ffffffff81121c90>] do_last+0x250/0xa30 > [<ffffffff81122547>] path_openat+0xd7/0x440 > [<ffffffff811229c9>] do_filp_open+0x49/0xa0 > [<ffffffff8110f7d6>] do_sys_open+0x106/0x240 > [<ffffffff8110f950>] sys_open+0x20/0x30 > [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d > [<ffffffffffffffff>] 0xffffffffffffffff > > Process B > [<ffffffff8110a9c1>] mem_cgroup_handle_oom+0x241/0x3b0 > [<ffffffff8110b5ab>] T.1146+0x5ab/0x5c0 > [<ffffffff8110c22e>] mem_cgroup_cache_charge+0xbe/0xe0 > [<ffffffff810ca28c>] add_to_page_cache_locked+0x4c/0x140 > [<ffffffff810ca3a2>] add_to_page_cache_lru+0x22/0x50 > [<ffffffff810ca45b>] grab_cache_page_write_begin+0x8b/0xe0 > [<ffffffff81193a18>] ext3_write_begin+0x88/0x270 > [<ffffffff810c8fc6>] generic_file_buffered_write+0x116/0x290 > [<ffffffff810cb3cc>] __generic_file_aio_write+0x27c/0x480 > [<ffffffff810cb646>] generic_file_aio_write+0x76/0xf0 # takes ->i_mutex > [<ffffffff8111156a>] do_sync_write+0xea/0x130 > [<ffffffff81112183>] vfs_write+0xf3/0x1f0 > [<ffffffff81112381>] sys_write+0x51/0x90 > [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d > [<ffffffffffffffff>] 0xffffffffffffffff It looks like grab_cache_page_write_begin() passes __GFP_FS into __page_cache_alloc() and mem_cgroup_cache_charge(). Which makes me think that this deadlock is also possible in the page allocator even before getting to add_to_page_cache_lru. no? Can callers holding fs resources (e.g. i_mutex) pass __GFP_FS into the page allocator? If __GFP_FS was avoided, then I think memcg user page charging would need a !__GFP_FS check to avoid invoking oom killer, but at least then we'd avoid both deadlocks and cover both page allocation and memcg page charging in similar fashion. Example from memcg_charge_kmem: may_oom = (gfp & __GFP_FS) && !(gfp & __GFP_NORETRY); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>