Re: [RFC] Mechanism to induce memory reclaim

Michal Hocko <mhocko@xxxxxxxx> · Tue, 8 Mar 2022 13:52:33 +0100

On Mon 07-03-22 18:31:41, Shakeel Butt wrote:
> On Mon, Mar 07, 2022 at 03:41:45PM +0100, Michal Hocko wrote:
> > On Sun 06-03-22 15:11:23, David Rientjes wrote:
> > [...]
> > > Some questions to get discussion going:
> > >
> > >  - Overall feedback or suggestions for the proposal in general?
> 
> > Do we really need this interface? What would be usecases which cannot
> > use an existing interfaces we have for that? Most notably memcg and
> > their high limit?
> 
> 
> Let me take a stab at this. The specific reasons why high limit is not a
> good interface to implement proactive reclaim:
> 
> 1) It can cause allocations from the target application to get
> throttled.
> 
> 2) It leaves a state (high limit) in the kernel which needs to be reset
> by the userspace part of proactive reclaimer.
> 
> If I remember correctly, Facebook actually tried to use high limit to
> implement the proactive reclaim but due to exactly these limitations [1]
> they went the route [2] aligned with this proposal.

I do remember we have discussed this in the past. There were proposals
for an additional limit to trigger a background reclaim [3] or to add a
pressure based memcg knob [4]. For the nr_to_reclaim based interface
there were some challenges outlined in that email thread. I do
understand that practical experience could have confirmed or diminished
those concerns.

I am definitely happy to restart those discussion but it would be really
great to summarize existing options and why they do not work in
practice. It would be also great to mention why concerns about nr_to_reclaim
based interface expressed in the past are not standing out anymore wrt.
other proposals.

> To further explain why the above limitations are pretty bad: The
> proactive reclaimers usually use feedback loop to decide how much to
> squeeze from the target applications without impacting their performance
> or impacting within a tolerable range. The metrics used for the feedback
> loop are either refaults or PSI and these metrics becomes messy due to
> application getting throttled due to high limit.

One thing is not really clear to me here. You are saying that the
PSI/refaults are influenced by the throttling IIUC. Does that mean that
your reclaimer is living outside of the controlled memcg? Or why does it
make any difference who is reclaiming the memory from the the metrics
POV?  I do understand that you want to avoid throttling on the regular
workload in that memcg and this is where the high limit comes short but
the work has to be done by somebody, right?

> For (2), the high limit interface is a very awkward interface to use to
> do proactive reclaim. If the userspace proactive reclaimer fails/crashed
> due to whatever reason during triggering the reclaim in an application,
> it can leave the application in a bad state (memory pressure state and
> throttled) for a long time.

Fair enough.

> [1] https://lore.kernel.org/all/20200928210216.GA378894@xxxxxxxxxxx/
> [2] https://dl.acm.org/doi/10.1145/3503222.3507731 (Section 3.3)

[3] http://lkml.kernel.org/r/20200922190859.GH12990@xxxxxxxxxxxxxx
    resp. http://lkml.kernel.org/r/20200219181219.54356-1-hannes@xxxxxxxxxxx/
[4] http://lkml.kernel.org/r/20200928210216.GA378894@xxxxxxxxxxx
-- 
Michal Hocko
SUSE Labs