On Wed, Jun 28, 2023 at 06:58:58PM +0200, Ard Biesheuvel wrote: > On Wed, 28 Jun 2023 at 08:21, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > On Mon, Jun 26, 2023 at 06:13:44PM +0800, Herbert Xu wrote: > > > On Mon, Jun 26, 2023 at 12:03:04PM +0200, Ard Biesheuvel wrote: > > > > > > > > In any case, what I would like to see addressed is the horrid scomp to > > > > acomp layer that ties up megabytes of memory in scratch space, just to > > > > emulate the acomp interface on top of scomp drivers, while no code > > > > exists that makes use of the async nature. Do you have an idea on how > > > > we might address this particular issue? > > > > > > The whole reason why need to allocate megabytes of memory is because > > > of the lack of SG lists in the underlying algorithm. If they > > > actually used SG lists and allocated pages as they went during > > > decompression, then we wouldn't need to pre-allocate any memory > > > at all. > > > > I don't think that is a realistic expectation. Decompressors generally need a > > contiguous buffer for decompressed data anyway, up to a certain size which is > > 32KB for DEFLATE but can be much larger for the more modern algorithms. This is > > because they decode "matches" that refer to previously decompressed data by > > offset, and it has to be possible to index the data efficiently. > > > > (Some decompressors, e.g. zlib, provide "streaming" APIs where you can read > > arbitrary amounts. But that works by actually decompressing into an internal > > buffer that has sufficient size, then copying to the user provided buffer.) > > > > The same applies to compressors too, with regards to the original data. > > > > I think the "input/output is a list of pages" model just fundamentally does not > > work well for software compression and decompression. To support it, either > > large temporary buffers are needed (they might be hidden inside the > > (de)compressor, but they are there), or vmap() or vm_map_ram() is needed. > > > > FWIW, f2fs compression uses vm_map_ram() and skips the crypto API entirely... > > > > If acomp has to be kept for the hardware support, then maybe its scomp backend > > should use vm_map_ram() instead of scratch buffers? > > > > Yeah, but we'll run into similar issues related to the fact that > scatterlists can describe arbitrary sequences of sub-page size memory > chunks, which means vmap()ing the pages may not be sufficient to get a > virtual linear representation of the buffers. Yes, that is annoying... Maybe the acomp API should not support arbitrary scatterlists, but rather only ones that can be mapped contiguously? > > With zswap being the only current user, which uses a single contiguous > buffers for decompression out of place, and blocks on the completion, > the level of additional complexity we have in the acomp stack is mind > boggling. And the scomp-to-acomp adaptation layer, with its fixed size > per-CPU in and output buffer (implying that acomp in/output has a > hardcoded size limit) which are never freed makes it rather > unpalatable to me tbh. Either way, I think the way out may be that zcomp should support *both* the scomp and acomp APIs. It should use scomp by default, and if someone *really* wants to use a hardware (de)compression accelerator, they can configure it to use acomp. Also, I see that crypto/scompress.c currently allocates scratch buffers (256KB per CPU!) whenever any crypto_scomp tfm is initialized. That seems wrong. It should only happen when a crypto_acomp tfm that is backed by a crypto_scomp tfm is initialized. Then, if acomp is not used, all this craziness will be avoided. - Eric