On 2024/5/29 03:32, Yosry Ahmed wrote: > On Tue, May 28, 2024 at 12:08 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: >> >> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: >>> >>> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: >>>> >>>> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: >>>>> >>>>> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote: >>>>>> Some trivial folio conversions in zswap code. >>>>> >>>>> The three patches themselves look good. >>>>> >>>>>> The mean reason I included a cover letter is that I wanted to get >>>>>> feedback on what other trivial conversions can/should be done in >>>>>> mm/zswap.c (keeping in mind that only order-0 folios are supported >>>>>> anyway). These are the things I came across while searching for 'page' >>>>>> in mm/zswap.c, and chose not to do anything about for now: >>>>> >>>>> I think there's a deeper question to answer before answering these >>>>> questions, which is what we intend to do with large folios and zswap in >>>>> the future. Do we intend to split them? Compress them as a large >>>>> folio? Compress each page in a large folio separately? I can see an >>>>> argument for choices 2 and 3, but I think choice 1 is going to be >>>>> increasingly untenable. >>>> >>>> Yeah I was kinda getting the small things out of the way so that zswap >>>> is fully folio-ized, before we think about large folios. I haven't >>>> given it a lot of thought, but here's what I have in mind. >>>> >>>> Right now, I think most configs enable zswap will disable >>>> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so >>>> let's assume that today we are splitting large folios before they go >>>> to zswap (i.e. choice 1). >>>> >>>> What we do next depends on how the core swap intends to deal with >>>> large folios. My understanding based on recent developments is that we >>>> intend to swapout large folios as a whole, but I saw some discussions >>>> about splitting all large folios before swapping them out, or leaving >>>> them whole but swapping them out in order-0 chunks. >>>> >>>> I assume the rationale is that there is little benefit to keeping the >>>> folios whole because they will most likely be freed soon anyway, but I >>>> understand not wanting to spend time on splitting them, so swapping >>>> them out in order-0 chunks makes some sense to me. It also dodges the >>>> whole fragmentation issue. >>>> >>>> If we do either of these things in the core swap code, then I think >>>> zswap doesn't need to do anything to support large folios. If not, >>>> then we need to make a choice between 2 (compress large folios) & >>>> choice 3 (compress each page separately) as you mentioned. >>>> >>>> Compressing large folios as a whole means that we need to decompress >>>> them as a whole to read a single page, which I think could be very >>>> inefficient in some cases or force us to swapin large folios. Unless >>>> of course we end up in a world where we mostly swapin the same large >>>> folios that we swapped out. Although there can be additional >>>> compression savings from compressing large folios as a whole. >>>> >>>> Hence, I think choice 3 is the most reasonable one, at least for the >>>> short-term. I also think this is what zram does, but I haven't >>>> checked. Even if we all agree on this, there are still questions that >>>> we need to answer. For example, do we allocate zswap_entry's for each >>>> order-0 chunk right away, or do we allocate a single zswap_entry for >>>> the entire folio, and then "split" it during swapin if we only need to >>>> read part of the folio? >>>> >>>> Wondering what others think here. >>> >>> More thoughts that came to mind here: >>> >>> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap >>> compression happens synchronously in the context of reclaim, so if we >>> start handling large folios in zswap, it may be more efficient to do >>> it asynchronously like swap to disk. >> >> We've been discussing this in private as well :) >> >> It doesn't have to be these two extremes right? I'm perfectly happy >> with starting with compressing each subpage separately, but perhaps we >> can consider managing larger folios in bigger chunks (say 64KB). That >> way, on swap-in, we just have to bring a whole chunk in, not the >> entire folio, and still take advantage of compression efficiencies on >> bigger-than-one-page chunks. I'd also check with other filesystems >> that leverage compression, to see what's their unit of compression is. > > Right. But I think it will be a clearer win to start with compressing > each subpage separately, and it avoids splitting folios during reclaim > to zswap. It also doesn't depend on the zsmalloc work. > > Once we have that, we can experiment with compressing folios in larger > chunks. The tradeoffs become less clear at that point, and the number > of variables you can tune goes up :) Agree, it's a good approach! And it hasn't any decompression amplification problem. Thanks.