At 2023-05-17 16:09:50, "Yosry Ahmed" <yosryahmed@xxxxxxxxxx> wrote: >On Wed, May 17, 2023 at 1:01 AM 程垲涛 Chengkaitao Cheng ><chengkaitao@xxxxxxxxxxxxxx> wrote: >> >> At 2023-05-17 14:59:06, "Yosry Ahmed" <yosryahmed@xxxxxxxxxx> wrote: >> >+David Rientjes >> > >> >On Tue, May 16, 2023 at 8:20 PM chengkaitao <chengkaitao@xxxxxxxxxxxxxx> wrote: >> >> >> >> Establish a new OOM score algorithm, supports the cgroup level OOM >> >> protection mechanism. When an global/memcg oom event occurs, we treat >> >> all processes in the cgroup as a whole, and OOM killers need to select >> >> the process to kill based on the protection quota of the cgroup. >> >> >> > >> >Perhaps this is only slightly relevant, but at Google we do have a >> >different per-memcg approach to protect from OOM kills, or more >> >specifically tell the kernel how we would like the OOM killer to >> >behave. >> > >> >We define an interface called memory.oom_score_badness, and we also >> >allow it to be specified per-process through a procfs interface, >> >similar to oom_score_adj. >> > >> >These scores essentially tell the OOM killer the order in which we >> >prefer memcgs to be OOM'd, and the order in which we want processes in >> >the memcg to be OOM'd. By default, all processes and memcgs start with >> >the same score. Ties are broken based on the rss of the process or the >> >usage of the memcg (prefer to kill the process/memcg that will free >> >more memory) -- similar to the current OOM killer. >> >> Thank you for providing a new application scenario. You have described a >> new per-memcg approach, but a simple introduction cannot explain the >> details of your approach clearly. If you could compare and analyze my >> patches for possible defects, or if your new approach has advantages >> that my patches do not have, I would greatly appreciate it. > >Sorry if I was not clear, I am not implying in any way that the >approach I am describing is better than your patches. I am guilty of >not conducting the proper analysis you are requesting. There is no perfect approach in the world, and I also seek your advice with a learning attitude. You don't need to say sorry, I should say thank you. >I just saw the thread and thought it might be interesting to you or >others to know the approach that we have been using for years in our >production. I guess the target is the same, be able to tell the OOM >killer which memcgs/processes are more important to protect. The >fundamental difference is that instead of tuning this based on the >memory usage of the memcg (your approach), we essentially give the OOM >killer the ordering in which we want memcgs/processes to be OOM >killed. This maps to jobs priorities essentially. Killing processes in order of memory usage cannot effectively protect important processes. Killing processes in a user-defined priority order will result in a large number of OOM events and still not being able to release enough memory. I have been searching for a balance between the two methods, so that their shortcomings are not too obvious. The biggest advantage of memcg is its tree topology, and I also hope to make good use of it. >If this approach works for you (or any other audience), that's great, >I can share more details and perhaps we can reach something that we >can both use :) If you have a good idea, please share more details or show some code. I would greatly appreciate it >> >> >This has been brought up before in other discussions without much >> >interest [1], but just thought it may be relevant here. >> > >> >[1]https://lore.kernel.org/lkml/CAHS8izN3ej1mqUpnNQ8c-1Bx5EeO7q5NOkh0qrY_4PLqc8rkHA@xxxxxxxxxxxxxx/#t -- Thanks for your comment! chengkaitao