> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Monday, December 2, 2024 9:50 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > nphamcs@xxxxxxxxx; usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; > 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications > for batching. > > On Mon, Dec 2, 2024 at 8:19 PM Sridhar, Kanchana P > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > -----Original Message----- > > > From: Chengming Zhou <chengming.zhou@xxxxxxxxx> > > > Sent: Monday, December 2, 2024 7:06 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; Yosry Ahmed > > > <yosryahmed@xxxxxxxxxx> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > > <vinodh.gopal@xxxxxxxxx> > > > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() > simplifications > > > for batching. > > > > > > On 2024/12/3 09:01, Sridhar, Kanchana P wrote: > > > > Hi Chengming, Yosry, > > > > > > > >> -----Original Message----- > > > >> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > >> Sent: Monday, December 2, 2024 11:33 AM > > > >> To: Chengming Zhou <chengming.zhou@xxxxxxxxx> > > > >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > > > >> kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > > > >> nphamcs@xxxxxxxxx; usamaarif642@xxxxxxxxx; > ryan.roberts@xxxxxxx; > > > >> 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, Wajdi K > > > >> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > > >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() > > > simplifications > > > >> for batching. > > > >> > > > >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou > > > >> <chengming.zhou@xxxxxxxxx> wrote: > > > >>> > > > >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote: > > > >>>> In order to set up zswap_store_pages() to enable a clean batching > > > >>>> implementation in [1], this patch implements the following changes: > > > >>>> > > > >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap > entries for > > > >>>> all pages in the specified range for the folio, upfront. If this fails, > > > >>>> we return an error status to zswap_store(). > > > >>>> > > > >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() > for > > > >> each > > > >>>> page, and returns false if any zswap_compress() fails, so > > > >>>> zswap_store_page() can cleanup resources allocated and return > an > > > >> error > > > >>>> status to zswap_store(). > > > >>>> > > > >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points > > > >>>> in zswap_store_pages(). This facilitates cleaner error handling > within > > > >>>> zswap_store_pages(), which will become important for IAA > compress > > > >>>> batching in [1]. > > > >>>> > > > >>>> [1]: https://patchwork.kernel.org/project/linux- > > > mm/list/?series=911935 > > > >>>> > > > >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > > > >>>> --- > > > >>>> mm/zswap.c | 93 > +++++++++++++++++++++++++++++++++++++++++- > > > --- > > > >> --------- > > > >>>> 1 file changed, 71 insertions(+), 22 deletions(-) > > > >>>> > > > >>>> diff --git a/mm/zswap.c b/mm/zswap.c > > > >>>> index b09d1023e775..db80c66e2205 100644 > > > >>>> --- a/mm/zswap.c > > > >>>> +++ b/mm/zswap.c > > > >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct > work_struct > > > >> *w) > > > >>>> * main API > > > >>>> **********************************/ > > > >>>> > > > >>>> +static bool zswap_compress_pages(struct page *pages[], > > > >>>> + struct zswap_entry *entries[], > > > >>>> + u8 nr_pages, > > > >>>> + struct zswap_pool *pool) > > > >>>> +{ > > > >>>> + u8 i; > > > >>>> + > > > >>>> + for (i = 0; i < nr_pages; ++i) { > > > >>>> + if (!zswap_compress(pages[i], entries[i], pool)) > > > >>>> + return false; > > > >>>> + } > > > >>>> + > > > >>>> + return true; > > > >>>> +} > > > >>> > > > >>> How about introducing a `zswap_compress_folio()` interface which > > > >>> can be used by `zswap_store()`? > > > >>> ``` > > > >>> zswap_store() > > > >>> nr_pages = folio_nr_pages(folio) > > > >>> > > > >>> entries = zswap_alloc_entries(nr_pages) > > > >>> > > > >>> ret = zswap_compress_folio(folio, entries, pool) > > > >>> > > > >>> // store entries into xarray and LRU list > > > >>> ``` > > > >>> > > > >>> And this version `zswap_compress_folio()` is very simple for now: > > > >>> ``` > > > >>> zswap_compress_folio() > > > >>> nr_pages = folio_nr_pages(folio) > > > >>> > > > >>> for (index = 0; index < nr_pages; ++index) { > > > >>> struct page *page = folio_page(folio, index); > > > >>> > > > >>> if (!zswap_compress(page, &entries[index], pool)) > > > >>> return false; > > > >>> } > > > >>> > > > >>> return true; > > > >>> ``` > > > >>> This can be easily extended to support your "batched" version. > > > >>> > > > >>> Then the old `zswap_store_page()` could be removed. > > > >>> > > > >>> The good point is simplicity, that we don't need to slice folio > > > >>> into multiple batches, then repeat the common operations for each > > > >>> batch, like preparing entries, storing into xarray and LRU list... > > > >> > > > >> +1 > > > > > > > > Thanks for the code review comments. One question though: would > > > > it make sense to trade-off the memory footprint cost with the code > > > > simplification? For instance, lets say we want to store a 64k folio. > > > > We would allocate memory for 16 zswap entries, and lets say one of > > > > the compressions fails, we would deallocate memory for all 16 zswap > > > > entries. Could this lead to zswap_entry kmem_cache starvation and > > > > subsequent zswap_store() failures in multiple processes scenarios? > > > > > > Ah, I get your consideration. But it's the unlikely case, right? > > > > > > If the case you mentioned above happens a lot, I think yes, we should > > > optimize its memory footprint to avoid allocation and deallocation. > > > > Thanks Chengming. I see your point. Let me gather performance data > > for the two options, and share. > > Yeah I think we shouldn't optimize for the uncommon case, not until > there's a real problem that needs fixing. Agreed. > > > > > > > > > On the other hand, we should consider a folio would be compressed > > > successfully in most cases. So we have to allocate all entries > > > eventually. > > > > > > Based on your consideration, I think your way is ok too, although > > > I think the patch 2/2 should be dropped, since it hides pages loop > > > in smaller functions, as Yosry mentioned too. > > > > My main intent with patch 2/2 was to set up the error handling > > path to be common and simpler, whether errors were encountered > > during compression/zpool_malloc/xarray store. Hence, I initialize the > > allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(- > EINVAL), > > so it is easy for the common error handling code in patch 2 to determine > > if the handle was allocated (and hence needs to be freed). This benefits > > the batching code by eliminating the need to maintain state as to which > > stage of zswap_store_pages() sees an error, based on which resources > > would need to be deleted. > > > > My key consideration is to keep the batching error handling code simple, > > hence these changes in patch 2. The changes described above would > > help batching, and should not impact the non-batching case, as indicated > > by the regression testing data I've included in the cover letter. > > > > I don't mind inlining the implementation of the helper functions, as I > > mentioned in my response to Yosry. I am hoping the error handling > > simplifications are acceptable, since they will help the batching code. > > I think having the loops open-coded should still be better than > separate helpers. But I understand not wanting to have the loops > directly in zswap_store(), as the error handling would be simpler if > we do it in a separate function like zswap_store_pages(). > > How about we just move the loop from zswap_store() to > zswap_store_page() and call it zswap_store_folio()? When batching is > added I imagine we may need to split the loop into two loops before > and after zswap_compress_folio(), which isn't very neat but is > probably fine. Sure, this sounds like a good way to organize the code. I will proceed as suggested. Thanks, Kanchana