On Mon, Jan 6, 2025 at 9:37 AM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > Hi Herbert, > > > -----Original Message----- > > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > > Sent: Saturday, December 28, 2024 3:46 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx; > > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > > linux-crypto@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; > > ardb@xxxxxxxxxx; ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, > > Kristen C <kristen.c.accardi@xxxxxxxxx>; Feghali, Wajdi K > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for > > compress/decompress batching. > > > > On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote: > > > This commit adds get_batch_size(), batch_compress() and > > batch_decompress() > > > interfaces to: > > > > First of all we don't need a batch compress/decompress interface > > because the whole point of request chaining is to supply the data > > in batches. > > > > I'm also against having a get_batch_size because the user should > > be supplying as much data as they're comfortable with. In other > > words if the user is happy to give us 8 requests for iaa then it > > should be happy to give us 8 requests for every implementation. > > > > The request chaining interface should be such that processing > > 8 requests is always better than doing 1 request at a time as > > the cost is amortised. > > Thanks for your comments. Can you please elaborate on how > request chaining would enable cost amortization for software > compressors? With the current implementation, a module like > zswap would need to do the following to invoke request chaining > for software compressors (in addition to pushing the chaining > to the user layer for IAA, as per your suggestion on not needing a > batch compress/decompress interface): > > zswap_batch_compress(): > for (i = 0; i < nr_pages_in_batch; ++i) { > /* set up the acomp_req "reqs[i]". */ > [ ... ] > if (i) > acomp_request_chain(reqs[i], reqs[0]); > else > acomp_reqchain_init(reqs[0], 0, crypto_req_done, crypto_wait); > } > > /* Process the request chain in series. */ > err = crypto_wait_req(acomp_do_req_chain(reqs[0], crypto_acomp_compress), crypto_wait); > > Internally, acomp_do_req_chain() would sequentially process the > request chain by: > 1) adding all requests to a list "state" > 2) call "crypto_acomp_compress()" for the next list element > 3) when this request completes, dequeue it from the list "state" > 4) repeat for all requests in "state" > 5) When the last request in "state" completes, call "reqs[0]->base.complete()", > which notifies crypto_wait. > > From what I can understand, the latency cost should be the same for > processing a request chain in series vs. processing each request as it is > done today in zswap, by calling: > > comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait); > > It is not clear to me if there is a cost amortization benefit for software > compressors. One of the requirements from Yosry was that there should > be no change for the software compressors, which is what I have > attempted to do in v5. > > Can you please help us understand if there is a room for optimizing > the implementation of the synchronous "acomp_do_req_chain()" API? > I would also like to get inputs from the zswap maintainers on using > request chaining for a batching implementation for software compressors. Is there a functional change in doing so, or just using different interfaces to accomplish the same thing we do today?