> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Friday, November 8, 2024 2:54 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; > linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx; > davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx; > ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C > <kristen.c.accardi@xxxxxxxxx>; zanussi@xxxxxxxxxx; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v3 09/13] mm: zswap: Modify struct crypto_acomp_ctx > to be configurable in nr of acomp_reqs. > > [..] > > > > > > > > > > There are no other callers to these functions. Just do the work > > > > > directly in the cpu callbacks here like it used to be. > > > > > > > > There will be other callers to zswap_create_acomp_ctx() and > > > > zswap_delete_acomp_ctx() in patches 10 and 11 of this series, when the > > > > per-cpu "acomp_batch_ctx" is introduced in struct zswap_pool. I was > trying > > > > to modularize the code first, so as to split the changes into smaller > commits. > > > > > > > > The per-cpu "acomp_batch_ctx" resources are allocated in patch 11 in > the > > > > "zswap_pool_can_batch()" function, that allocates batching resources > > > > for this cpu. This was to address Yosry's earlier comment about > minimizing > > > > the memory footprint cost of batching. > > > > > > > > The way I decided to do this is by reusing the code that allocates the de- > > > facto > > > > pool->acomp_ctx for the selected compressor for all cpu's in > > > zswap_pool_create(). > > > > However, I did not want to add the acomp_batch_ctx multiple > reqs/buffers > > > > allocation to the cpuhp_state_add_instance() code path which would > incur > > > the > > > > memory cost on all cpu's. > > > > > > > > Instead, the approach I chose to follow is to allocate the batching > resources > > > > in patch 11 only as needed, on "a given cpu" that has to store a large > folio. > > > Hope > > > > this explains the purpose of the modularization better. > > > > > > > > Other ideas towards accomplishing this are very welcome. > > > > > > If we remove the sysctl as suggested by Johannes, then we can just > > > allocate the number of buffers based on the compressor and whether it > > > supports batching during the pool initialization in the cpu callbacks > > > only. > > > > > > Right? > > > > Yes, we could do that if the sysctl is removed, as suggested by Johannes. > > The only "drawback" of allocating the batching resources (assuming the > > compressor allows batching) would be that the memory footprint penalty > > would be incurred on every cpu. I was trying to further economize this > > cost based on whether a given cpu actually needs to zswap_store() a > > large folio, and only then allocate the batching resources. Although, I am > > not sure if this would benefit any usage model. > > > > If we agree the pool initialization is the best place to allocate the batching > > resources, then I will make this change in v4. > > IIUC the additional cost would apply if someone wants to use > deflate-iaa on hardware that supports batching but does not want to > use batching. I don't think catering to such a use case warrants the > complexity in advance, not until we have a user that genuinely cares. Sure, this makes sense. I will address this in v4. Thanks, Kanchana