RE: [PATCH v5 02/12] crypto: acomp - Define new interfaces for compress/decompress batching.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Yosry,

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Sent: Monday, January 6, 2025 3:24 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx;
> nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx;
> usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx;
> akpm@xxxxxxxxxxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx;
> davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx;
> ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C
> <kristen.c.accardi@xxxxxxxxx>; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>;
> Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> compress/decompress batching.
> 
> On Mon, Jan 6, 2025 at 9:37 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >
> > Hi Herbert,
> >
> > > -----Original Message-----
> > > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > > Sent: Saturday, December 28, 2024 3:46 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> > > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx;
> > > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> > > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org;
> > > linux-crypto@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> clabbe@xxxxxxxxxxxx;
> > > ardb@xxxxxxxxxx; ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi,
> > > Kristen C <kristen.c.accardi@xxxxxxxxx>; Feghali, Wajdi K
> > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > > Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> > > compress/decompress batching.
> > >
> > > On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote:
> > > > This commit adds get_batch_size(), batch_compress() and
> > > batch_decompress()
> > > > interfaces to:
> > >
> > > First of all we don't need a batch compress/decompress interface
> > > because the whole point of request chaining is to supply the data
> > > in batches.
> > >
> > > I'm also against having a get_batch_size because the user should
> > > be supplying as much data as they're comfortable with.  In other
> > > words if the user is happy to give us 8 requests for iaa then it
> > > should be happy to give us 8 requests for every implementation.
> > >
> > > The request chaining interface should be such that processing
> > > 8 requests is always better than doing 1 request at a time as
> > > the cost is amortised.
> >
> > Thanks for your comments. Can you please elaborate on how
> > request chaining would enable cost amortization for software
> > compressors? With the current implementation, a module like
> > zswap would need to do the following to invoke request chaining
> > for software compressors (in addition to pushing the chaining
> > to the user layer for IAA, as per your suggestion on not needing a
> > batch compress/decompress interface):
> >
> > zswap_batch_compress():
> >    for (i = 0; i < nr_pages_in_batch; ++i) {
> >       /* set up the acomp_req "reqs[i]". */
> >       [ ... ]
> >       if (i)
> >         acomp_request_chain(reqs[i], reqs[0]);
> >       else
> >         acomp_reqchain_init(reqs[0], 0, crypto_req_done, crypto_wait);
> >    }
> >
> >    /* Process the request chain in series. */
> >    err = crypto_wait_req(acomp_do_req_chain(reqs[0],
> crypto_acomp_compress), crypto_wait);
> >
> > Internally, acomp_do_req_chain() would sequentially process the
> > request chain by:
> > 1) adding all requests to a list "state"
> > 2) call "crypto_acomp_compress()" for the next list element
> > 3) when this request completes, dequeue it from the list "state"
> > 4) repeat for all requests in "state"
> > 5) When the last request in "state" completes, call "reqs[0]-
> >base.complete()",
> >     which notifies crypto_wait.
> >
> > From what I can understand, the latency cost should be the same for
> > processing a request chain in series vs. processing each request as it is
> > done today in zswap, by calling:
> >
> >   comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx-
> >reqs[0]), &acomp_ctx->wait);
> >
> > It is not clear to me if there is a cost amortization benefit for software
> > compressors. One of the requirements from Yosry was that there should
> > be no change for the software compressors, which is what I have
> > attempted to do in v5.
> >
> > Can you please help us understand if there is a room for optimizing
> > the implementation of the synchronous "acomp_do_req_chain()" API?
> > I would also like to get inputs from the zswap maintainers on using
> > request chaining for a batching implementation for software compressors.
> 
> Is there a functional change in doing so, or just using different
> interfaces to accomplish the same thing we do today?

The code paths for software compressors are considerably different between
these two scenarios:

1) Given a batch of 8 pages: for each page, call zswap_compress() that does this:

    	comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait);

2) Given a batch of 8 pages:
     a) Create a request chain of 8 acomp_reqs, starting with reqs[0], as
         described earlier.
     b) Process the request chain by calling:

              err = crypto_wait_req(acomp_do_req_chain(reqs[0], crypto_acomp_compress), &acomp_ctx->wait);
	/* Get each req's error status. */
	for (i = 0; i < nr_pages; ++i) {
		errors[i] = acomp_request_err(reqs[i]);
		if (errors[i]) {
			pr_debug("Request chaining req %d compress error %d\n", i, errors[i]);
		} else {
			dlens[i] = reqs[i]->dlen;
		}
	}

What I mean by considerably different code paths is that request chaining
internally overwrites the req's base.complete and base.data (after saving the
original values) to implement the algorithm described earlier. Basically, the
chain is processed in series by getting the next req in the chain, setting it's
completion function to "acomp_reqchain_done()", which gets called when
the "op" (crypto_acomp_compress()) is completed for that req.
acomp_reqchain_done() will cause the next req to be processed in the
same manner. If this next req happens to be the last req to be processed,
it will notify the original completion function of reqs[0], with the crypto_wait
that zswap sets up in zswap_cpu_comp_prepare():

	acomp_request_set_callback(acomp_ctx->reqs[0], CRYPTO_TFM_REQ_MAY_BACKLOG,
				   crypto_req_done, &acomp_ctx->wait);

Patch [1] in v5 of this series has the full implementation of acomp_do_req_chain()
in case you want to understand this in more detail.

The "functional change" wrt request chaining is limited to the above.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20241221063119.29140-2-kanchana.p.sridhar@xxxxxxxxx/

Thanks,
Kanchana





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux