Re: [PATCH v2 0/3] mm: tlb swap entries batch async release

Barry Song <21cnbao@xxxxxxxxx> · Wed, 7 Aug 2024 06:21:20 +0800

On Wed, Aug 7, 2024 at 6:39 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 7 Aug 2024 04:32:09 +0800 Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> > > > their independent mm, rather than parent and child processes share the
> > > > same mm. Therefore, when the kernel executes multiple exiting process
> > > > simultaneously, they will definitely occupy multiple CPU core resources
> > > > to complete it.
> > >
> > > What I'm asking is why not change those userspace processes so that they
> > > fork off a child process which shares the MM (shared mm_struct) and
> > > then the original process exits, leaving the asynchronously-running
> > > child to clean up the MM resources.
> >
> > Not Zhiguo. From my perspective as a phone engineer, this issue isn't related
> > to the parent-child process or the wait() function. Phones rely heavily on
> > mechanisms similar to the OOM killer to function efficiently. For instance,
> > if you're using apps like YouTube, TikTok, and Facebook, and then you
> > open the camera app to take a photo, the camera app becomes the foreground
> > process and demands a lot of memory. In this scenario, the phone might
> > decide to terminate the most memory-consuming and less important apps,
> > such as TikTok or YouTube, to free up memory for the camera app. TikTok
> > and YouTube become less important because they are no longer occupying
> > the phone's screen and have moved to the background. The faster TikTok
> > and YouTube can be unmapped, the quicker the camera app can launch,
> > enhancing the user experience.
>
> I don't see how this relates to my question.
>
> Userspace can arrange for these resources to be released in an
> asynchronous fashion (can't it?).  So why change the kernel to do that?

I don't believe that userspace can distinguish between swap entries
and PTEs that point to folios.

If we are killing tiktok now, we will be performing munmap
and zap_pte_range(). The PTEs for tiktok might look like this:

PTE0 - page
PTE1 - swap
PTE2 - swap
PTE3 - page
PTE4 - swap
PTE5 - swap
PTE6 - swap
PTE7 - page

Currently, zap_pte_range is freeing all PTEs one by one. While PTE0,
PTE2, and PTE7 can contribute to freeing memory and help accelerate
the launch of the new camera app, PTE1, PTE3, PTE4, PTE5, and
PTE6 do not. They are blocking the memory release of PTE0, PTE2,
and PTE7.

By handling this in kernel space, freeing memory and releasing
swaps won't block each other:

T1                                             T2
PTE0 - page              PTE1 -SWAP
PTE3 - page              PTE2-SWAP
PTE7 - page              PTE4 -SWAP
...

On phones, over 60% of an app's memory could be in swap. This 60%
is obstructing the normal memory release for munmap.

Thanks
Barry