Re: [PATCH v2] mm, memcg: skip killing processes under memcg protection at first scan

Yafang Shao <laoar.shao@xxxxxxxxx> · Tue, 20 Aug 2019 15:49:20 +0800

On Tue, Aug 20, 2019 at 3:27 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 20-08-19 15:15:54, Yafang Shao wrote:
> > On Tue, Aug 20, 2019 at 2:40 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Tue 20-08-19 09:16:01, Yafang Shao wrote:
> > > > On Tue, Aug 20, 2019 at 5:12 AM Roman Gushchin <guro@xxxxxx> wrote:
> > > > >
> > > > > On Sun, Aug 18, 2019 at 09:18:06PM -0400, Yafang Shao wrote:
> > > > > > In the current memory.min design, the system is going to do OOM instead
> > > > > > of reclaiming the reclaimable pages protected by memory.min if the
> > > > > > system is lack of free memory. While under this condition, the OOM
> > > > > > killer may kill the processes in the memcg protected by memory.min.
> > > > > > This behavior is very weird.
> > > > > > In order to make it more reasonable, I make some changes in the OOM
> > > > > > killer. In this patch, the OOM killer will do two-round scan. It will
> > > > > > skip the processes under memcg protection at the first scan, and if it
> > > > > > can't kill any processes it will rescan all the processes.
> > > > > >
> > > > > > Regarding the overhead this change may takes, I don't think it will be a
> > > > > > problem because this only happens under system  memory pressure and
> > > > > > the OOM killer can't find any proper victims which are not under memcg
> > > > > > protection.
> > > > >
> > > > > Hi Yafang!
> > > > >
> > > > > The idea makes sense at the first glance, but actually I'm worried
> > > > > about mixing per-memcg and per-process characteristics.
> > > > > Actually, it raises many questions:
> > > > > 1) if we do respect memory.min, why not memory.low too?
> > > >
> > > > memroy.low is different with memory.min, as the OOM killer will not be
> > > > invoked when it is reached.
> > >
> > > Responded in other email thread (please do not post two versions of the
> > > patch on the same day because it makes conversation too scattered and
> > > confusing).
> > >
> > (This is an issue about time zone :-) )
>
> Normally we wait few days until feedback on the particular patch is
> settled before a new version is posted.
>
> > > Think of min limit protection as some sort of a more inteligent mlock.
> >
> > Per my perspective, it is a less inteligent mlock, because what it
> > protected may be a garbage memory.
> > As I said before, what it protected is the memroy usage, rather than a
> > specified file memory or anon memory or somethin else.
> >
> > The advantage of it is easy to use.
> >
> > > It protects from the regular memory reclaim and it can lead to the OOM
> > > situation (be it global or memcg) but by no means it doesn't prevent
> > > from the system to kill the workload if there is a need. Those two
> > > decisions are simply orthogonal IMHO. The later is a an emergency action
> > > while the former is to help guanratee a runtime behavior of the workload.
> > >
> >
> > If it can handle OOM memory reclaim, it will be more inteligent.
>
> Can we get back to an actual usecase please?
>

No real usecase.
What we concerned is if it can lead to more OOMs but can't protect
itself in OOM then this behavior seems a little wierd.
Setting oom_score_adj is another choice,  but there's no memcg-level
oom_score_adj.
memory.min is memcg-level, while oom_score_adj is process-level, that
is wierd as well.

> > > To be completely fair, the OOM killer is a sort of the memory reclaim as
> > > well so strictly speaking both mlock and memcg min protection could be
> > > considered but from any practical aspect I can think of I simply do not
> > > see a strong usecase that would justify a more complex oom behavior.
> > > People will be simply confused that the selection is less deterministic
> > > and therefore more confusing.
> > > --
> >
> > So what about ajusting the oom_socore_adj automatically when we set
> > memory.min or mlock ?
>
> oom_score_adj is a _user_ tuning. The kernel has no business in
> auto-tuning it. It should just consume the value.
>
> --
> Michal Hocko
> SUSE Labs