On Mon, May 4, 2020 at 11:36 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Mon, May 4, 2020 at 8:00 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Mon 04-05-20 07:53:01, Shakeel Butt wrote: > > > On Mon, May 4, 2020 at 7:11 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > On Mon 04-05-20 06:54:40, Shakeel Butt wrote: > > > > > On Sun, May 3, 2020 at 11:56 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Thu 30-04-20 11:27:12, Shakeel Butt wrote: > > > > > > > Lowering memory.max can trigger an oom-kill if the reclaim does not > > > > > > > succeed. However if oom-killer does not find a process for killing, it > > > > > > > dumps a lot of warnings. > > > > > > > > > > > > It shouldn't dump much more than the regular OOM report AFAICS. Sure > > > > > > there is "Out of memory and no killable processes..." message printed as > > > > > > well but is that a real problem? > > > > > > > > > > > > > Deleting a memcg does not reclaim memory from it and the memory can > > > > > > > linger till there is a memory pressure. One normal way to proactively > > > > > > > reclaim such memory is to set memory.max to 0 just before deleting the > > > > > > > memcg. However if some of the memcg's memory is pinned by others, this > > > > > > > operation can trigger an oom-kill without any process and thus can log a > > > > > > > lot un-needed warnings. So, ignore all such warnings from memory.max. > > > > > > > > > > > > OK, I can see why you might want to use memory.max for that purpose but > > > > > > I do not really understand why the oom report is a problem here. > > > > > > > > > > It may not be a problem for an individual or small scale deployment > > > > > but when "sweep before tear down" is the part of the workflow for > > > > > thousands of machines cycling through hundreds of thousands of cgroups > > > > > then we can potentially flood the logs with not useful dumps and may > > > > > hide (or overflow) any useful information in the logs. > > > > > > > > If you are doing this in a large scale and the oom report is really a > > > > problem then you shouldn't be resetting hard limit to 0 in the first > > > > place. > > > > > > > > > > I think I have pretty clearly described why we want to reset the hard > > > limit to 0, so, unless there is an alternative I don't see why we > > > should not be doing this. > > > > I am not saying you shouldn't be doing that. I am just saying that if > > you do then you have to live with oom reports. > > > > > > > > memory.max can trigger the oom kill and user should be expecting the oom > > > > > > report under that condition. Why is "no eligible task" so special? Is it > > > > > > because you know that there won't be any tasks for your particular case? > > > > > > What about other use cases where memory.max is not used as a "sweep > > > > > > before tear down"? > > > > > > > > > > What other such use-cases would be? The only use-case I can envision > > > > > of adjusting limits dynamically of a live cgroup are resource > > > > > managers. However for cgroup v2, memory.high is the recommended way to > > > > > limit the usage, so, why would resource managers be changing > > > > > memory.max instead of memory.high? I am not sure. What do you think? > > > > > > > > There are different reasons to use the hard limit. Mostly to contain > > > > potential runaways. While high limit might be a sufficient measure to > > > > achieve that as well the hard limit is the last resort. And it clearly > > > > has the oom killer semantic so I am not really sure why you are > > > > comparing the two. > > > > > > > > > > I am trying to see if "no eligible task" is really an issue and should > > > be warned for the "other use cases". The only real use-case I can > > > think of are resource managers adjusting the limit dynamically. I > > > don't see "no eligible task" a concerning reason for such use-case. > > > > It is very much a concerning reason to notify about like any other OOM > > situation due to hard limit breach. In this case it is worse in some > > sense because the limit cannot be trimmed down because there is no > > directly reclaimable memory at all. Such an oom situation is > > effectivelly conserved. > > -- > > Let me make a more precise statement and tell me if you agree. The "no > eligible task" is concerning for the charging path but not for the > writer of memory.max. The writer can read the usage and > cgroup.[procs|events] to figure out the situation if needed. Agreed. cgroup.[procs|events] can give all the admin want in this situation. The oom report is a redundant infomation, really. > Usually > such writers (i.e. resource managers) use memory.high in addition to > memory.max. First set memory.high and once the usage is below the high > then set max to not induce the oom-kills. > -- Thanks Yafang