On Wed, Sep 12, 2018 at 02:35:34PM +0200, Michal Hocko wrote: > On Tue 11-09-18 08:27:30, Roman Gushchin wrote: > > On Tue, Sep 11, 2018 at 02:11:41PM +0200, Michal Hocko wrote: > > > On Mon 10-09-18 14:56:22, Roman Gushchin wrote: > > > > The memcg OOM killer is never invoked due to a failed high-order > > > > allocation, however the MEMCG_OOM event can be easily raised. > > > > > > > > Under some memory pressure it can happen easily because of a > > > > concurrent allocation. Let's look at try_charge(). Even if we were > > > > able to reclaim enough memory, this check can fail due to a race > > > > with another allocation: > > > > > > > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > > > > goto retry; > > > > > > > > For regular pages the following condition will save us from triggering > > > > the OOM: > > > > > > > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > > > > goto retry; > > > > > > > > But for high-order allocation this condition will intentionally fail. > > > > The reason behind is that we'll likely fall to regular pages anyway, > > > > so it's ok and even preferred to return ENOMEM. > > > > > > > > In this case the idea of raising the MEMCG_OOM event looks dubious. > > > > > > Why is this a problem though? IIRC this event was deliberately placed > > > outside of the oom path because we wanted to count allocation failures > > > and this is also documented that way > > > > > > oom > > > The number of time the cgroup's memory usage was > > > reached the limit and allocation was about to fail. > > > > > > Depending on context result could be invocation of OOM > > > killer and retrying allocation or failing a > > > > > > One could argue that we do not apply the same logic to GFP_NOWAIT > > > requests but in general I would like to see a good reason to change > > > the behavior and if it is really the right thing to do then we need to > > > update the documentation as well. > > > > Right, the current behavior matches the documentation, because the description > > of the event is broad enough. My point is that the current behavior is not > > useful in my corner case. > > > > Let me explain my case in details: I've got a report about sporadic memcg oom > > kills on some hosts with plenty of pagecache and low memory pressure. > > You'll probably agree, that raising OOM signal in this case looks strange. > > I am not sure I follow. So you see both OOM_KILL and OOM events and the > user misinterprets OOM ones? No, I see sporadic OOMs without OOM_KILLs in cgroups with plenty of pagecache and low memory pressure. It's not a pre-OOM condition at all. > > My understanding was that OOM event should tell admin that the limit > should be increased in order to allow more charges. Without OOM_KILL > events it means that those failed charges have some sort of fallback > so it is not critical condition for the workload yet. Something to watch > for though in case of perf. degradation or potential misbehavior. Right, something like "there is a shortage of memory which will likely lead to OOM soon". It's not my case. > > Whether this is how the event is used, I dunno. Anyway, if you want to > just move the event and make it closer to OOM_KILL then I strongly > suspect the event is losing its relevance. I agree here (about losing relevance), but don't think it's a reason to generate misleading events. Thanks!