On Fri, Jul 10, 2020 at 07:12:22AM -0700, Shakeel Butt wrote: > On Fri, Jul 10, 2020 at 5:29 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Thu 09-07-20 12:47:18, Roman Gushchin wrote: > > > Memory.high limit is implemented in a way such that the kernel > > > penalizes all threads which are allocating a memory over the limit. > > > Forcing all threads into the synchronous reclaim and adding some > > > artificial delays allows to slow down the memory consumption and > > > potentially give some time for userspace oom handlers/resource control > > > agents to react. > > > > > > It works nicely if the memory usage is hitting the limit from below, > > > however it works sub-optimal if a user adjusts memory.high to a value > > > way below the current memory usage. It basically forces all workload > > > threads (doing any memory allocations) into the synchronous reclaim > > > and sleep. This makes the workload completely unresponsive for > > > a long period of time and can also lead to a system-wide contention on > > > lru locks. It can happen even if the workload is not actually tight on > > > memory and has, for example, a ton of cold pagecache. > > > > > > In the current implementation writing to memory.high causes an atomic > > > update of page counter's high value followed by an attempt to reclaim > > > enough memory to fit into the new limit. To fix the problem described > > > above, all we need is to change the order of execution: try to push > > > the memory usage under the limit first, and only then set the new > > > high limit. > > > > Shakeel would this help with your pro-active reclaim usecase? It would > > require to reset the high limit right after the reclaim returns which is > > quite ugly but it would at least not require a completely new interface. > > You would simply do > > high = current - to_reclaim > > echo $high > memory.high > > echo infinity > memory.high # To prevent direct reclaim > > # allocation stalls > > > > This will reduce the chance of stalls but the interface is still > non-delegatable i.e. applications can not change their own memory.high > for the use-cases like application controlled proactive reclaim and > uswapd. Can you, please, elaborate a bit more on this? I didn't understand why. > > One more ugly fix would be to add one more layer of cgroup and the > application use memory.high of that layer to fulfill such use-cases. > > I think providing a new interface would allow us to have a much > cleaner solution than to settle on a bunch of ugly hacks. > > > The primary reason to set the high limit in advance was to catch > > potential runaways more effectively because they would just get > > throttled while memory_high_write is reclaiming. With this change > > the reclaim here might be just playing never ending catch up. On the > > plus side a break out from the reclaim loop would just enforce the limit > > so if the operation takes too long then the reclaim burden will move > > over to consumers eventually. So I do not see any real danger. > > > > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > > > Reported-by: Domas Mituzas <domas@xxxxxx> > > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > > > Cc: Tejun Heo <tj@xxxxxxxxxx> > > > Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> > > > Cc: Chris Down <chris@xxxxxxxxxxxxxx> > > > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > > > This patch seems reasonable on its own. > > Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Thank you!