Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing for nearly full memcg

Barry Song <21cnbao@xxxxxxxxx> · Thu, 31 Oct 2024 10:21:27 +1300

On Thu, Oct 31, 2024 at 10:10 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> [..]
> > >>> A crucial component is still missing—managing the compression and decompression
> > >>> of multiple pages as a larger block. This could significantly reduce
> > >>> system time and
> > >>> potentially resolve the kernel build issue within a small memory
> > >>> cgroup, even with
> > >>> swap thrashing.
> > >>>
> > >>> I’ll send an update ASAP so you can rebase for zswap.
> > >>
> > >> Did you mean https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@xxxxxxxxx/?
> > >> Thats wont benefit zswap, right?
> > >
> > > That's right. I assume we can also make it work with zswap?
> >
> > Hopefully yes. Thats mainly why I was looking at that series, to try and find
> > a way to do something similar for zswap.
>
> I would prefer for these things to be done separately. We still need
> to evaluate the compression/decompression of large blocks. I am mainly
> concerned about having to decompress a large chunk to fault in one
> page.
>
> The obvious problems are fault latency, and wasted work having to
> consistently decompress the large chunk to take one page from it. We
> also need to decide if we'd rather split it after decompression and
> compress the parts that we didn't swap in separately.
>
> This can cause problems beyond the fault latency. Imagine the case
> where the system is under memory pressure, so we fallback to order-0
> swapin to avoid reclaim. Now we want to decompress a chunk that used
> to be 64K.

Yes, this could be an issue.

We had actually tried to utilize several buffers for those partial
swap-in cases,
where the decompressed data was held in anticipation of the upcoming
swap-in. This approach could address the majority of partial swap-ins for
fallback scenarios.

>
> We need to allocate 64K of contiguous memory for a temporary
> allocation to be able to fault a 4K page. Now we either need to:
> - Go into reclaim, which we were trying to avoid to begin with.
> - Dip into reserves to allocate the 64K as it's a temporary
> allocation. This is probably risky because under memory pressure, many
> CPUs may be doing this concurrently.

This has been addressed by using contiguous memory that is prepared on
a per-CPU basis., search the below:
"alloc_pages() might fail, so we don't depend on allocation:"
https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@xxxxxxxxx/

Thanks
Barry