On Wed, 4 Dec 2013, Johannes Weiner wrote: > > Now that a per-process flag is available, define it for processes that > > handle userspace oom notifications. This is an optimization to avoid > > mantaining a list of such processes attached to a memcg at any given time > > and iterating it at charge time. > > > > This flag gets set whenever a process has registered for an oom > > notification and is cleared whenever it unregisters. > > > > When memcg reclaim has failed to free any memory, it is necessary for > > userspace oom handlers to be able to dip into reserves to pagefault text, > > allocate kernel memory to read the "tasks" file, allocate heap, etc. > > The task handling the OOM of a memcg can obviously not be part of that > same memcg. > Not without memory.oom_reserve_in_bytes that this series adds, that's true. Michal expressed interest in the idea of memcg oom reserves in the past, so I thought I'd share the series. > On Tue, 3 Dec 2013 at 15:35:48 +0800, Li Zefan wrote: > > On Mon, 2 Dec 2013 at 11:44:06 -0500, Johannes Weiner wrote: > > > On Fri, Nov 29, 2013 at 03:05:25PM -0500, Tejun Heo wrote: > > > > Whoa, so we support oom handler inside the memcg that it handles? > > > > Does that work reliably? Changing the above detail in this patch > > > > isn't difficult (and we'll later need to update kernfs too) but > > > > supporting such setup properly would be a *lot* of commitment and I'm > > > > very doubtful we'd be able to achieve that by just carefully avoiding > > > > memory allocation in the operations that usreland oom handler uses - > > > > that set is destined to expand over time, extremely fragile and will > > > > be hellish to maintain. > > > > It works reliably with this patch series, yes. I'm not sure what change this is referring to that would avoid memory allocation for userspace oom handlers, and I'd agree that it would be difficult to maintain a no-allocation policy for a subset of processes that are destined to handle oom handlers. That's not what this series is addressing, though, and in fact it's quite the opposite. It acknowledges that userspace oom handlers need to allocate and that anything else would be too difficult to maintain (thereby agreeing with the above), so we must set aside memory that they are exclusively allowed to access. For the vast majority of users who will not use userspace oom handlers, they can just use the default value of memory.oom_reserve_in_bytes == 0 and they incur absolutely no side- effects as a result of this series. For those who do use userspace oom handlers, like Google, this allows us to set aside memory to allow the userspace oom handlers to kill a process, dump the heap, send a signal, drop caches, etc. when waking up. > > > > So, I'm not at all excited about commiting to this guarantee. This > > > > one is an easy one but it looks like the first step onto dizzying > > > > slippery slope. > > > > > > > > Am I misunderstanding something here? Are you and Johannes firm on > > > > supporting this? > > > > > > Handling a memcg OOM from userspace running inside that OOM memcg is > > > completely crazy. I mean, think about this for just two seconds... > > > Really? > > > > > > I get that people are doing it right now, and if you can get away with > > > it for now, good for you. But you have to be aware how crazy this is > > > and if it breaks you get to keep the pieces and we are not going to > > > accomodate this in the kernel. Fix your crazy userspace. > > The rest of this email communicates only one thing: someone thinks it's crazy. And I agree it would be crazy if we don't allow that class of process to have access to a pre-defined amount of memory to handle the situation, which this series adds. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>