Re: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 01, 2024 at 07:51:43PM -0700, Andrew Morton wrote:
> On Mon,  1 Jul 2024 22:20:46 +0800 Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> 
> > Currently, we're encountering latency spikes in our container environment
> > when a specific container with multiple Python-based tasks exits. These
> > tasks may hold the zone->lock for an extended period, significantly
> > impacting latency for other containers attempting to allocate memory.
> 
> Is this locking issue well understood? 

I cannot comment about others but I believe this problem to be
well-understood. The zone->lock is an incredibly large lock at this point
protecting an unbounded amount of data. As time goes by, it's just getting
worse and it was terrible even a few years ago, let alone now.

> Is anyone working on it? 

Not that I'm aware of but I've paid so little attention to linux-mm in
the last few years, that's not saying much.

The main problem is that it's hard to solve quickly as splitting that
lock is possible, but not trivial.  I am mildly concerned that more and
more people are looking for ways of getting around zone->lock contention
using the PCP allocator. I believe that to be a losing battle even though
I added THP to the PCP caching myself. Now we have dynamic resizing which
works ok but piling on top of it are file-backed THPs and THPs smaller than
MAX_ORDER, folios in general etc. Dealing with that within PCP has limits and
adding more sysctls to deal with corner cases is a band-aid that most users
probably will miss. Working around all the zone->lock issues in PCP just
delays the inevitable as PCP doesn't play well with overall availability
(e.g. high order pages free but on a remote CPU), fragmentation control
(frag fallback because desired page type are on a remote CPU) or scaling
(because ultimately it can still contend on zone->lock). IIUC, pcp lists
were originally about preserving cache hotness with zone->lock contention
reduction as a bonus but now it's a band aid trying to deal with for
zone->lock covering massive amounts of memory.

Eventually the work will have to be put into splitting zone lock using
something akin to memory arenas and moving away from zone_id to identify
what range of free lists a particular page belongs to.

-- 
Mel Gorman
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux