Re: [PATCH 1/2] memcg, oom: unmark under_oom after the oom killer is done

Haifeng Xu <haifeng.xu@xxxxxxxxxx> · Mon, 25 Sep 2023 20:28:02 +0800

On 2023/9/25 19:38, Michal Hocko wrote:
> On Mon 25-09-23 17:03:05, Haifeng Xu wrote:
>>
>>
>> On 2023/9/25 15:57, Michal Hocko wrote:
>>> On Fri 22-09-23 07:05:28, Haifeng Xu wrote:
>>>> When application in userland receives oom notification from kernel
>>>> and reads the oom_control file, it's confusing that under_oom is 0
>>>> though the omm killer hasn't finished. The reason is that under_oom
>>>> is cleared before invoking mem_cgroup_out_of_memory(), so move the
>>>> action that unmark under_oom after completing oom handling. Therefore,
>>>> the value of under_oom won't mislead users.
>>>
>>> I do not really remember why are we doing it this way but trying to track
>>> this down shows that we have been doing that since fb2a6fc56be6 ("mm:
>>> memcg: rework and document OOM waiting and wakeup"). So this is an
>>> established behavior for 10 years now. Do we really need to change it
>>> now? The interface is legacy and hopefully no new workloads are
>>> emerging.
>>>
>>> I agree that the placement is surprising but I would rather not change
>>> that unless there is a very good reason for that. Do you have any actual
>>> workload which depends on the ordering? And if yes, how do you deal with
>>> timing when the consumer of the notification just gets woken up after
>>> mem_cgroup_out_of_memory completes?
>>
>> yes, when the oom event is triggered, we check the under_oom every 10 seconds. If it
>> is cleared, then we create a new process with less memory allocation to avoid oom again.
> 
> OK, I do understand what you mean and I could have made myself
> more clear previously. Even if the state is cleared _after_
> mem_cgroup_out_of_memory then you won't get what you need I am
> afraid. The memcg stays under OOM until a memory is freed (uncharged)
> from that memcg. mem_cgroup_out_of_memory itself doesn't really free
> any memory on its own. It relies on the task to wake up and die or
> oom_reaper to do the work on its behalf. All of that is time dependent.
> under_oom would have to be reimplemented to be cleared when a memory is
> unchanrged to meet your demands. Something that has never really been
> the semantic.
> 

yes, but at least before we create the new process, it has more chance to get some memory freed.

> Btw. is this something new that you are developing on top of v1? And if
> yes, why don't you use v2?
> 

yes, v2 doesn't have the "cgroup.event_control" file.