On Thu 09-07-20 12:47:18, Roman Gushchin wrote: > Memory.high limit is implemented in a way such that the kernel > penalizes all threads which are allocating a memory over the limit. > Forcing all threads into the synchronous reclaim and adding some > artificial delays allows to slow down the memory consumption and > potentially give some time for userspace oom handlers/resource control > agents to react. > > It works nicely if the memory usage is hitting the limit from below, > however it works sub-optimal if a user adjusts memory.high to a value > way below the current memory usage. It basically forces all workload > threads (doing any memory allocations) into the synchronous reclaim > and sleep. This makes the workload completely unresponsive for > a long period of time and can also lead to a system-wide contention on > lru locks. It can happen even if the workload is not actually tight on > memory and has, for example, a ton of cold pagecache. > > In the current implementation writing to memory.high causes an atomic > update of page counter's high value followed by an attempt to reclaim > enough memory to fit into the new limit. To fix the problem described > above, all we need is to change the order of execution: try to push > the memory usage under the limit first, and only then set the new > high limit. Shakeel would this help with your pro-active reclaim usecase? It would require to reset the high limit right after the reclaim returns which is quite ugly but it would at least not require a completely new interface. You would simply do high = current - to_reclaim echo $high > memory.high echo infinity > memory.high # To prevent direct reclaim # allocation stalls The primary reason to set the high limit in advance was to catch potential runaways more effectively because they would just get throttled while memory_high_write is reclaiming. With this change the reclaim here might be just playing never ending catch up. On the plus side a break out from the reclaim loop would just enforce the limit so if the operation takes too long then the reclaim burden will move over to consumers eventually. So I do not see any real danger. > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Reported-by: Domas Mituzas <domas@xxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Tejun Heo <tj@xxxxxxxxxx> > Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> > Cc: Chris Down <chris@xxxxxxxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index b8424aa56e14..4b71feee7c42 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6203,8 +6203,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > if (err) > return err; > > - page_counter_set_high(&memcg->memory, high); > - > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > unsigned long reclaimed; > @@ -6228,6 +6226,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > break; > } > > + page_counter_set_high(&memcg->memory, high); > + > return nbytes; > } > > -- > 2.26.2 -- Michal Hocko SUSE Labs