Re: [PATCH v3 09/13] mm: zswap: Modify struct crypto_acomp_ctx to be configurable in nr of acomp_reqs.

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Fri, 8 Nov 2024 14:54:11 -0800

[..]
> > > >
> > > > There are no other callers to these functions. Just do the work
> > > > directly in the cpu callbacks here like it used to be.
> > >
> > > There will be other callers to zswap_create_acomp_ctx() and
> > > zswap_delete_acomp_ctx() in patches 10 and 11 of this series, when the
> > > per-cpu "acomp_batch_ctx" is introduced in struct zswap_pool. I was trying
> > > to modularize the code first, so as to split the changes into smaller commits.
> > >
> > > The per-cpu "acomp_batch_ctx" resources are allocated in patch 11 in the
> > > "zswap_pool_can_batch()" function, that allocates batching resources
> > > for this cpu. This was to address Yosry's earlier comment about minimizing
> > > the memory footprint cost of batching.
> > >
> > > The way I decided to do this is by reusing the code that allocates the de-
> > facto
> > > pool->acomp_ctx for the selected compressor for all cpu's in
> > zswap_pool_create().
> > > However, I did not want to add the acomp_batch_ctx multiple reqs/buffers
> > > allocation to the cpuhp_state_add_instance() code path which would incur
> > the
> > > memory cost on all cpu's.
> > >
> > > Instead, the approach I chose to follow is to allocate the batching resources
> > > in patch 11 only as needed, on "a given cpu" that has to store a large folio.
> > Hope
> > > this explains the purpose of the modularization better.
> > >
> > > Other ideas towards accomplishing this are very welcome.
> >
> > If we remove the sysctl as suggested by Johannes, then we can just
> > allocate the number of buffers based on the compressor and whether it
> > supports batching during the pool initialization in the cpu callbacks
> > only.
> >
> > Right?
>
> Yes, we could do that if the sysctl is removed, as suggested by Johannes.
> The only "drawback" of allocating the batching resources (assuming the
> compressor allows batching) would be that the memory footprint penalty
> would be incurred on every cpu. I was trying to further economize this
> cost based on whether a given cpu actually needs to zswap_store() a
> large folio, and only then allocate the batching resources. Although, I am
> not sure if this would benefit any usage model.
>
> If we agree the pool initialization is the best place to allocate the batching
> resources, then I will make this change in v4.

IIUC the additional cost would apply if someone wants to use
deflate-iaa on hardware that supports batching but does not want to
use batching. I don't think catering to such a use case warrants the
complexity in advance, not until we have a user that genuinely cares.