Re: [v2 PATCH 0/5] crypto: Add akcipher interface without SGs

Ard Biesheuvel <ardb@xxxxxxxxxx> · Wed, 28 Jun 2023 18:58:58 +0200

On Wed, 28 Jun 2023 at 08:21, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> On Mon, Jun 26, 2023 at 06:13:44PM +0800, Herbert Xu wrote:
> > On Mon, Jun 26, 2023 at 12:03:04PM +0200, Ard Biesheuvel wrote:
> > >
> > > In any case, what I would like to see addressed is the horrid scomp to
> > > acomp layer that ties up megabytes of memory in scratch space, just to
> > > emulate the acomp interface on top of scomp drivers, while no code
> > > exists that makes use of the async nature. Do you have an idea on how
> > > we might address this particular issue?
> >
> > The whole reason why need to allocate megabytes of memory is because
> > of the lack of SG lists in the underlying algorithm.  If they
> > actually used SG lists and allocated pages as they went during
> > decompression, then we wouldn't need to pre-allocate any memory
> > at all.
>
> I don't think that is a realistic expectation.  Decompressors generally need a
> contiguous buffer for decompressed data anyway, up to a certain size which is
> 32KB for DEFLATE but can be much larger for the more modern algorithms.  This is
> because they decode "matches" that refer to previously decompressed data by
> offset, and it has to be possible to index the data efficiently.
>
> (Some decompressors, e.g. zlib, provide "streaming" APIs where you can read
> arbitrary amounts.  But that works by actually decompressing into an internal
> buffer that has sufficient size, then copying to the user provided buffer.)
>
> The same applies to compressors too, with regards to the original data.
>
> I think the "input/output is a list of pages" model just fundamentally does not
> work well for software compression and decompression.  To support it, either
> large temporary buffers are needed (they might be hidden inside the
> (de)compressor, but they are there), or vmap() or vm_map_ram() is needed.
>
> FWIW, f2fs compression uses vm_map_ram() and skips the crypto API entirely...
>
> If acomp has to be kept for the hardware support, then maybe its scomp backend
> should use vm_map_ram() instead of scratch buffers?
>

Yeah, but we'll run into similar issues related to the fact that
scatterlists can describe arbitrary sequences of sub-page size memory
chunks, which means vmap()ing the pages may not be sufficient to get a
virtual linear representation of the buffers.

With zswap being the only current user, which uses a single contiguous
buffers for decompression out of place, and blocks on the completion,
the level of additional complexity we have in the acomp stack is mind
boggling. And the scomp-to-acomp adaptation layer, with its fixed size
per-CPU in and output buffer (implying that acomp in/output has a
hardcoded size limit) which are never freed makes it rather
unpalatable to me tbh.