On 2023/9/25 19:38, Michal Hocko wrote: > On Mon 25-09-23 17:03:05, Haifeng Xu wrote: >> >> >> On 2023/9/25 15:57, Michal Hocko wrote: >>> On Fri 22-09-23 07:05:28, Haifeng Xu wrote: >>>> When application in userland receives oom notification from kernel >>>> and reads the oom_control file, it's confusing that under_oom is 0 >>>> though the omm killer hasn't finished. The reason is that under_oom >>>> is cleared before invoking mem_cgroup_out_of_memory(), so move the >>>> action that unmark under_oom after completing oom handling. Therefore, >>>> the value of under_oom won't mislead users. >>> >>> I do not really remember why are we doing it this way but trying to track >>> this down shows that we have been doing that since fb2a6fc56be6 ("mm: >>> memcg: rework and document OOM waiting and wakeup"). So this is an >>> established behavior for 10 years now. Do we really need to change it >>> now? The interface is legacy and hopefully no new workloads are >>> emerging. >>> >>> I agree that the placement is surprising but I would rather not change >>> that unless there is a very good reason for that. Do you have any actual >>> workload which depends on the ordering? And if yes, how do you deal with >>> timing when the consumer of the notification just gets woken up after >>> mem_cgroup_out_of_memory completes? >> >> yes, when the oom event is triggered, we check the under_oom every 10 seconds. If it >> is cleared, then we create a new process with less memory allocation to avoid oom again. > > OK, I do understand what you mean and I could have made myself > more clear previously. Even if the state is cleared _after_ > mem_cgroup_out_of_memory then you won't get what you need I am > afraid. The memcg stays under OOM until a memory is freed (uncharged) > from that memcg. mem_cgroup_out_of_memory itself doesn't really free > any memory on its own. It relies on the task to wake up and die or > oom_reaper to do the work on its behalf. All of that is time dependent. > under_oom would have to be reimplemented to be cleared when a memory is > unchanrged to meet your demands. Something that has never really been > the semantic. > yes, but at least before we create the new process, it has more chance to get some memory freed. > Btw. is this something new that you are developing on top of v1? And if > yes, why don't you use v2? > yes, v2 doesn't have the "cgroup.event_control" file.