On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote: > With the default overcommit==guess we occasionally run into mmap > rejections despite plenty of memory that would get dropped under > pressure but just isn't accounted reclaimable. One example of this is > dying cgroups pinned by some page cache. A previous case was auxiliary > path name memory associated with dentries; we have since annotated > those allocations to avoid overcommit failures (see d79f7aa496fc ("mm: > treat indirectly reclaimable memory as free in overcommit logic")). > > But trying to classify all allocated memory reliably as reclaimable > and unreclaimable is a bit of a fool's errand. There could be a myriad > of dependencies that constantly change with kernel versions. > > It becomes even more questionable of an effort when considering how > this estimate of available memory is used: it's not compared to the > system-wide allocated virtual memory in any way. It's not even > compared to the allocating process's address space. It's compared to > the single allocation request at hand! > > So we have an elaborate left-hand side of the equation that tries to > assess the exact breathing room the system has available down to a > page - and then compare it to an isolated allocation request with no > additional context. We could fail an allocation of N bytes, but for > two allocations of N/2 bytes we'd do this elaborate dance twice in a > row and then still let N bytes of virtual memory through. This doesn't > make a whole lot of sense. > > Let's take a step back and look at the actual goal of the > heuristic. From the documentation: > > Heuristic overcommit handling. Obvious overcommits of address > space are refused. Used for a typical system. It ensures a > seriously wild allocation fails while allowing overcommit to > reduce swap usage. root is allowed to allocate slightly more > memory in this mode. This is the default. > > If all we want to do is catch clearly bogus allocation requests > irrespective of the general virtual memory situation, the physical > memory counter-part doesn't need to be that complicated, either. > > When in GUESS mode, catch wild allocations by comparing their request > size to total amount of ram and swap in the system. > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> My 2c here: any kinds of percpu counters and percpu data is accounted as unreclaimable and can alter the calculation significantly. This is a special problem on hosts, which were idle for some time. Without any memory pressure, kernel caches do occupy most of the memory, so than a following attempt to start a workload fails. With a big pleasure: Acked-by: Roman Gushchin <guro@xxxxxx> Thanks!