Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for compress/decompress batching.

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Mon, 6 Jan 2025 15:24:15 -0800

On Mon, Jan 6, 2025 at 9:37 AM Sridhar, Kanchana P
<kanchana.p.sridhar@xxxxxxxxx> wrote:
>
> Hi Herbert,
>
> > -----Original Message-----
> > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > Sent: Saturday, December 28, 2024 3:46 AM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx;
> > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> > linux-crypto@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx;
> > ardb@xxxxxxxxxx; ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi,
> > Kristen C <kristen.c.accardi@xxxxxxxxx>; Feghali, Wajdi K
> > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> > compress/decompress batching.
> >
> > On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote:
> > > This commit adds get_batch_size(), batch_compress() and
> > batch_decompress()
> > > interfaces to:
> >
> > First of all we don't need a batch compress/decompress interface
> > because the whole point of request chaining is to supply the data
> > in batches.
> >
> > I'm also against having a get_batch_size because the user should
> > be supplying as much data as they're comfortable with.  In other
> > words if the user is happy to give us 8 requests for iaa then it
> > should be happy to give us 8 requests for every implementation.
> >
> > The request chaining interface should be such that processing
> > 8 requests is always better than doing 1 request at a time as
> > the cost is amortised.
>
> Thanks for your comments. Can you please elaborate on how
> request chaining would enable cost amortization for software
> compressors? With the current implementation, a module like
> zswap would need to do the following to invoke request chaining
> for software compressors (in addition to pushing the chaining
> to the user layer for IAA, as per your suggestion on not needing a
> batch compress/decompress interface):
>
> zswap_batch_compress():
>    for (i = 0; i < nr_pages_in_batch; ++i) {
>       /* set up the acomp_req "reqs[i]". */
>       [ ... ]
>       if (i)
>         acomp_request_chain(reqs[i], reqs[0]);
>       else
>         acomp_reqchain_init(reqs[0], 0, crypto_req_done, crypto_wait);
>    }
>
>    /* Process the request chain in series. */
>    err = crypto_wait_req(acomp_do_req_chain(reqs[0], crypto_acomp_compress), crypto_wait);
>
> Internally, acomp_do_req_chain() would sequentially process the
> request chain by:
> 1) adding all requests to a list "state"
> 2) call "crypto_acomp_compress()" for the next list element
> 3) when this request completes, dequeue it from the list "state"
> 4) repeat for all requests in "state"
> 5) When the last request in "state" completes, call "reqs[0]->base.complete()",
>     which notifies crypto_wait.
>
> From what I can understand, the latency cost should be the same for
> processing a request chain in series vs. processing each request as it is
> done today in zswap, by calling:
>
>   comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait);
>
> It is not clear to me if there is a cost amortization benefit for software
> compressors. One of the requirements from Yosry was that there should
> be no change for the software compressors, which is what I have
> attempted to do in v5.
>
> Can you please help us understand if there is a room for optimizing
> the implementation of the synchronous "acomp_do_req_chain()" API?
> I would also like to get inputs from the zswap maintainers on using
> request chaining for a batching implementation for software compressors.

Is there a functional change in doing so, or just using different
interfaces to accomplish the same thing we do today?