On Mon, Jan 6, 2025 at 6:06 PM Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Jan 06, 2025 at 05:46:01PM -0800, Yosry Ahmed wrote: > > > > For software compressors, the batch size should be 1. In that > > scenario, from a zswap perspective (without going into the acomp > > implementation details please), is there a functional difference? If > > not, we can just use the request chaining API regardless of batching > > if that is what Herbert means. > > If you can supply a batch size of 8 for iaa, there is no reason > why you can't do it for software algorithms. It's the same > reason that we have GSO in the TCP stack, regardless of whether > the hardware can handle TSO. The main problem is memory usage. Zswap needs a PAGE_SIZE*2-sized buffer for each request on each CPU. We preallocate these buffers to avoid trying to allocate this much memory in the reclaim path (i.e. potentially allocating two pages to reclaim one). With batching, we need to preallocate N PAGE_SIZE*2-sized buffers on each CPU instead. For N=8, we are allocating PAGE_SIZE*14 extra memory on each CPU (56 KB on x86). That cost may be acceptable with IAA hardware accelerated batching, but not for software compressors that will end up processing the batch serially anyway. Does this make sense to you or did I miss something? > > The amortisation of the segmentation cost means that it will be > a win over-all. > > Cheers, > -- > Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt >