On Mon, Jun 26, 2023 at 06:13:44PM +0800, Herbert Xu wrote: > On Mon, Jun 26, 2023 at 12:03:04PM +0200, Ard Biesheuvel wrote: > > > > In any case, what I would like to see addressed is the horrid scomp to > > acomp layer that ties up megabytes of memory in scratch space, just to > > emulate the acomp interface on top of scomp drivers, while no code > > exists that makes use of the async nature. Do you have an idea on how > > we might address this particular issue? > > The whole reason why need to allocate megabytes of memory is because > of the lack of SG lists in the underlying algorithm. If they > actually used SG lists and allocated pages as they went during > decompression, then we wouldn't need to pre-allocate any memory > at all. I don't think that is a realistic expectation. Decompressors generally need a contiguous buffer for decompressed data anyway, up to a certain size which is 32KB for DEFLATE but can be much larger for the more modern algorithms. This is because they decode "matches" that refer to previously decompressed data by offset, and it has to be possible to index the data efficiently. (Some decompressors, e.g. zlib, provide "streaming" APIs where you can read arbitrary amounts. But that works by actually decompressing into an internal buffer that has sufficient size, then copying to the user provided buffer.) The same applies to compressors too, with regards to the original data. I think the "input/output is a list of pages" model just fundamentally does not work well for software compression and decompression. To support it, either large temporary buffers are needed (they might be hidden inside the (de)compressor, but they are there), or vmap() or vm_map_ram() is needed. FWIW, f2fs compression uses vm_map_ram() and skips the crypto API entirely... If acomp has to be kept for the hardware support, then maybe its scomp backend should use vm_map_ram() instead of scratch buffers? - Eric