Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection

程垲涛 Chengkaitao Cheng <chengkaitao@xxxxxxxxxxxxxx> · Wed, 17 May 2023 10:01:48 +0000

At 2023-05-17 16:09:50, "Yosry Ahmed" <yosryahmed@xxxxxxxxxx> wrote:
>On Wed, May 17, 2023 at 1:01 AM 程垲涛 Chengkaitao Cheng
><chengkaitao@xxxxxxxxxxxxxx> wrote:
>>
>> At 2023-05-17 14:59:06, "Yosry Ahmed" <yosryahmed@xxxxxxxxxx> wrote:
>> >+David Rientjes
>> >
>> >On Tue, May 16, 2023 at 8:20 PM chengkaitao <chengkaitao@xxxxxxxxxxxxxx> wrote:
>> >>
>> >> Establish a new OOM score algorithm, supports the cgroup level OOM
>> >> protection mechanism. When an global/memcg oom event occurs, we treat
>> >> all processes in the cgroup as a whole, and OOM killers need to select
>> >> the process to kill based on the protection quota of the cgroup.
>> >>
>> >
>> >Perhaps this is only slightly relevant, but at Google we do have a
>> >different per-memcg approach to protect from OOM kills, or more
>> >specifically tell the kernel how we would like the OOM killer to
>> >behave.
>> >
>> >We define an interface called memory.oom_score_badness, and we also
>> >allow it to be specified per-process through a procfs interface,
>> >similar to oom_score_adj.
>> >
>> >These scores essentially tell the OOM killer the order in which we
>> >prefer memcgs to be OOM'd, and the order in which we want processes in
>> >the memcg to be OOM'd. By default, all processes and memcgs start with
>> >the same score. Ties are broken based on the rss of the process or the
>> >usage of the memcg (prefer to kill the process/memcg that will free
>> >more memory) -- similar to the current OOM killer.
>>
>> Thank you for providing a new application scenario. You have described a
>> new per-memcg approach, but a simple introduction cannot explain the
>> details of your approach clearly. If you could compare and analyze my
>> patches for possible defects, or if your new approach has advantages
>> that my patches do not have, I would greatly appreciate it.
>
>Sorry if I was not clear, I am not implying in any way that the
>approach I am describing is better than your patches. I am guilty of
>not conducting the proper analysis you are requesting.

There is no perfect approach in the world, and I also seek your advice with
a learning attitude. You don't need to say sorry, I should say thank you.

>I just saw the thread and thought it might be interesting to you or
>others to know the approach that we have been using for years in our
>production. I guess the target is the same, be able to tell the OOM
>killer which memcgs/processes are more important to protect. The
>fundamental difference is that instead of tuning this based on the
>memory usage of the memcg (your approach), we essentially give the OOM
>killer the ordering in which we want memcgs/processes to be OOM
>killed. This maps to jobs priorities essentially.

Killing processes in order of memory usage cannot effectively protect
important processes. Killing processes in a user-defined priority order
will result in a large number of OOM events and still not being able to
release enough memory. I have been searching for a balance between
the two methods, so that their shortcomings are not too obvious.
The biggest advantage of memcg is its tree topology, and I also hope
to make good use of it.

>If this approach works for you (or any other audience), that's great,
>I can share more details and perhaps we can reach something that we
>can both use :)

If you have a good idea, please share more details or show some code.
I would greatly appreciate it

>>
>> >This has been brought up before in other discussions without much
>> >interest [1], but just thought it may be relevant here.
>> >
>> >[1]https://lore.kernel.org/lkml/CAHS8izN3ej1mqUpnNQ8c-1Bx5EeO7q5NOkh0qrY_4PLqc8rkHA@xxxxxxxxxxxxxx/#t

--
Thanks for your comment!
chengkaitao