Re: [PATCH] mm: swap: async free swap slot cache entries

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Thu, 28 Dec 2023 07:34:44 -0800

On Sun, Dec 24, 2023 at 2:07 PM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> On Sun, Dec 24, 2023 at 1:13 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
> >
> > On Sun, 24 Dec 2023, Chris Li wrote:
> >
> > > On Sat, Dec 23, 2023 at 7:01 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
> > > >
> > > > On Sat, 23 Dec 2023, Chris Li wrote:
> > > >
> > > > > > How do you quantify the impact of the delayed swap_entry_free()?
> > > > > >
> > > > > > Since the free and memcg uncharge are now delayed, is there not the
> > > > > > possibility that we stay under memory pressure for longer?  (Assuming at
> > > > > > least some users are swapping because of memory pressure.)
> > > > > >
> > > > > > I would assume that since the free and uncharge itself is delayed that in
> > > > > > the pathological case we'd actually be swapping *more* until the async
> > > > > > worker can run.
> > > > >
> > > > > Thanks for raising this interesting question.
> > > > >
> > > > > First of all, the swap_entry_free() does not impact "memory.current".
> > > > > It reduces "memory.swap.current". Technically it is the swap pressure
> > > > > not memory pressure that suffers the extra delay.
> > > > >
> > > > > Secondly, we are talking about delaying up to 64 swap entries for a
> > > > > few microseconds.
> > > >
> > > > What guarantees that the async freeing happens within a few microseconds?
> > >
> > > Linux kernel typically doesn't provide RT scheduling guarantees. You
> > > can change microseconds to milliseconds, my following reasoning still
> > > holds.
> > >
> >
> > What guarantees that the async freeing happens even within 10s?  Your
> > responses are implying that there is some deadline by which this freeing
> > absolutely must happen (us or ms), but I don't know of any strong
> > guarantees.
>
> I think we are in agreement there, there are no such strong guarantees
> in linux scheduling. However, when there are free CPU resources, the
> job will get scheduled to execute in a reasonable table time frame. If
> it does not, I consider that a bug if the CPU has idle resources and
> the pending jobs are not able to run for a long time.
> The existing code doesn't have such a guarantee either, see my point
> follows. I don't know why you want to ask for such a guarantee.
>
> > If there are no absolute guarantees about when the freeing may now occur,
> > I'm asking how the impact of the delayed swap_entry_free() can be
> > quantified.
>
> Presumably each application has their own SLO metrics for monitoring
> their application behavior. I am happy to take a look if any app has
> new SLO violations caused by this change.
> If you have one metric in mind, please  name it so we can look at it
> together. During my current experiment and the chromebook benchmark, I
> haven't noticed such ill effects show up in the other metrics drops in
> a statistically significant manner. That is not the same as saying
> such drops don't exist at all. Just I haven't noticed or the SLO
> watching system hasn't caught it.
>
> > The benefit to the current implementation is that there *are* strong
> > guarantees about when the freeing will occur and cannot grow exponentially
> > before the async worker can do the freeing.
>
> I don't understand your point. Please help me. In the current code,
> for the previous swapin fault that releases the swap slots into the
> swap slot caches. Let's say the swap slot remains in the cache for X
> seconds until Nth (N < 64) swapin page fault later, the cache is full
> and finally all 64 swap slot caches are free. Are you suggesting there
> is some kind of guarantee X is less than some fixed bound seconds?
> What is that bound then? 10 second? 1 minutes?
>
> BTW, there will be no exponential growth, that is guaranteed. Until
> the 64 entries cache were freed. The swapin code will take the direct
> free path for the current swap slot in hand. The direct free path
> existed before my change.

FWIW, it's 64 * the number of CPUs.