Re: [PATCH for 3.2] memcg: do not trap chargers with full callstack on OOM

"azurIt" <azurit@xxxxxxxx> · Tue, 09 Jul 2013 15:19:21 +0200

>On Mon 08-07-13 01:42:24, azurIt wrote:
>> > CC: "Michal Hocko" <mhocko@xxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, "cgroups mailinglist" <cgroups@xxxxxxxxxxxxxxx>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@xxxxxxxxxxxxxx>
>> >On Fri, Jul 05, 2013 at 09:02:46PM +0200, azurIt wrote:
>> >> >I looked at your debug messages but could not find anything that would
>> >> >hint at a deadlock.  All tasks are stuck in the refrigerator, so I
>> >> >assume you use the freezer cgroup and enabled it somehow?
>> >> 
>> >> 
>> >> Yes, i'm really using freezer cgroup BUT i was checking if it's not
>> >> doing problems - unfortunately, several days passed from that day
>> >> and now i don't fully remember if i was checking it for both cases
>> >> (unremoveabled cgroups and these freezed processes holding web
>> >> server port). I'm 100% sure i was checking it for unremoveable
>> >> cgroups but not so sure for the other problem (i had to act quickly
>> >> in that case). Are you sure (from stacks) that freezer cgroup was
>> >> enabled there?
>> >
>> >Yeah, all the traces without exception look like this:
>> >
>> >1372089762/23433/stack:[<ffffffff81080925>] refrigerator+0x95/0x160
>> >1372089762/23433/stack:[<ffffffff8106ab7b>] get_signal_to_deliver+0x1cb/0x540
>> >1372089762/23433/stack:[<ffffffff8100188b>] do_signal+0x6b/0x750
>> >1372089762/23433/stack:[<ffffffff81001fc5>] do_notify_resume+0x55/0x80
>> >1372089762/23433/stack:[<ffffffff815cac77>] int_signal+0x12/0x17
>> >1372089762/23433/stack:[<ffffffffffffffff>] 0xffffffffffffffff
>> >
>> >so the freezer was already enabled when you took the backtraces.
>> >
>> >> Btw, what about that other stacks? I mean this file:
>> >> http://watchdog.sk/lkml/memcg-bug-7.tar.gz
>> >> 
>> >> It was taken while running the kernel with your patch and from
>> >> cgroup which was under unresolveable OOM (just like my very original
>> >> problem).
>> >
>> >I looked at these traces too, but none of the tasks are stuck in rmdir
>> >or the OOM path.  Some /are/ in the page fault path, but they are
>> >happily doing reclaim and don't appear to be stuck.  So I'm having a
>> >hard time matching this data to what you otherwise observed.
>
>Agreed.
>
>> >However, based on what you reported the most likely explanation for
>> >the continued hangs is the unfinished OOM handling for which I sent
>> >the followup patch for arch/x86/mm/fault.c.
>> 
>> Johannes,
>> 
>> today I tested both of your patches but problem with unremovable
>> cgroups, unfortunately, persists.
>
>Is the group empty again with marked under_oom?

Now i realized that i forgot to remove UID from that cgroup before trying to remove it, so cgroup cannot be removed anyway (we are using third party cgroup called cgroup-uid from Andrea Righi, which is able to associate all user's processes with target cgroup). Look here for cgroup-uid patch:
https://www.develer.com/~arighi/linux/patches/cgroup-uid/cgroup-uid-v8.patch

ANYWAY, i'm 101% sure that 'tasks' file was empty and 'under_oom' was permanently '1'.

azur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>