On Sat 01-06-13 02:11:51, Johannes Weiner wrote: [...] > I'm currently messing around with the below patch. When a task faults > and charges under OOM, the memcg is remembered in the task struct and > then made to sleep on the memcg's OOM waitqueue only after unwinding > the page fault stack. With the kernel OOM killer disabled, all tasks > in the OOMing group sit nicely in > > mem_cgroup_oom_synchronize > pagefault_out_of_memory > mm_fault_error > __do_page_fault > page_fault > 0xffffffffffffffff > > regardless of whether they were faulting anon or file. They do not > even hold the mmap_sem anymore at this point. > > [ I kept syscalls really simple for now and just have them return > -ENOMEM, never trap them at all (just like the global OOM case). > It would be more work to have them wait from a flatter stack too, > but it should be doable if necessary. ] > > I suggested this at the MM summit and people were essentially asking > if I was feeling well, so maybe I'm still missing a gaping hole in > this idea. I didn't get to look at the patch (will do on Monday) but it doesn't sounds entirely crazy. Well, we would have to drop mmap_sem so things have to be rechecked but we are doing that already with VM_FAULT_RETRY in some archs so it should work. > Patch only works on x86 as of now, on other architectures memcg OOM > will invoke the global OOM killer. [...] -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html