On Tue, Aug 20, 2019 at 3:27 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 20-08-19 15:15:54, Yafang Shao wrote: > > On Tue, Aug 20, 2019 at 2:40 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Tue 20-08-19 09:16:01, Yafang Shao wrote: > > > > On Tue, Aug 20, 2019 at 5:12 AM Roman Gushchin <guro@xxxxxx> wrote: > > > > > > > > > > On Sun, Aug 18, 2019 at 09:18:06PM -0400, Yafang Shao wrote: > > > > > > In the current memory.min design, the system is going to do OOM instead > > > > > > of reclaiming the reclaimable pages protected by memory.min if the > > > > > > system is lack of free memory. While under this condition, the OOM > > > > > > killer may kill the processes in the memcg protected by memory.min. > > > > > > This behavior is very weird. > > > > > > In order to make it more reasonable, I make some changes in the OOM > > > > > > killer. In this patch, the OOM killer will do two-round scan. It will > > > > > > skip the processes under memcg protection at the first scan, and if it > > > > > > can't kill any processes it will rescan all the processes. > > > > > > > > > > > > Regarding the overhead this change may takes, I don't think it will be a > > > > > > problem because this only happens under system memory pressure and > > > > > > the OOM killer can't find any proper victims which are not under memcg > > > > > > protection. > > > > > > > > > > Hi Yafang! > > > > > > > > > > The idea makes sense at the first glance, but actually I'm worried > > > > > about mixing per-memcg and per-process characteristics. > > > > > Actually, it raises many questions: > > > > > 1) if we do respect memory.min, why not memory.low too? > > > > > > > > memroy.low is different with memory.min, as the OOM killer will not be > > > > invoked when it is reached. > > > > > > Responded in other email thread (please do not post two versions of the > > > patch on the same day because it makes conversation too scattered and > > > confusing). > > > > > (This is an issue about time zone :-) ) > > Normally we wait few days until feedback on the particular patch is > settled before a new version is posted. > > > > Think of min limit protection as some sort of a more inteligent mlock. > > > > Per my perspective, it is a less inteligent mlock, because what it > > protected may be a garbage memory. > > As I said before, what it protected is the memroy usage, rather than a > > specified file memory or anon memory or somethin else. > > > > The advantage of it is easy to use. > > > > > It protects from the regular memory reclaim and it can lead to the OOM > > > situation (be it global or memcg) but by no means it doesn't prevent > > > from the system to kill the workload if there is a need. Those two > > > decisions are simply orthogonal IMHO. The later is a an emergency action > > > while the former is to help guanratee a runtime behavior of the workload. > > > > > > > If it can handle OOM memory reclaim, it will be more inteligent. > > Can we get back to an actual usecase please? > No real usecase. What we concerned is if it can lead to more OOMs but can't protect itself in OOM then this behavior seems a little wierd. Setting oom_score_adj is another choice, but there's no memcg-level oom_score_adj. memory.min is memcg-level, while oom_score_adj is process-level, that is wierd as well. > > > To be completely fair, the OOM killer is a sort of the memory reclaim as > > > well so strictly speaking both mlock and memcg min protection could be > > > considered but from any practical aspect I can think of I simply do not > > > see a strong usecase that would justify a more complex oom behavior. > > > People will be simply confused that the selection is less deterministic > > > and therefore more confusing. > > > -- > > > > So what about ajusting the oom_socore_adj automatically when we set > > memory.min or mlock ? > > oom_score_adj is a _user_ tuning. The kernel has no business in > auto-tuning it. It should just consume the value. > > -- > Michal Hocko > SUSE Labs