On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> wrote: > > In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first > copy the entry->handle memory to a temporary memory, which is allocated > using kmalloc. > > Obviously we can reuse the per-compressor dstmem to avoid allocating > every time, since it's percpu-compressor and protected in mutex. You are trading more memory for faster speed. Per-cpu data structure does not come free. It is expensive in terms of memory on a big server with a lot of CPU. Think more than a few hundred CPU. On the big servers, we might want to disable this optimization to save a few MB RAM, depending on the gain of this optimization. Do we have any benchmark suggesting how much CPU overhead or latency this per-CPU page buys us, compared to using kmalloc? Chris > > Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx> > --- > mm/zswap.c | 29 +++++++++-------------------- > 1 file changed, 9 insertions(+), 20 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 7ee54a3d8281..edb8b45ed5a1 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1772,9 +1772,9 @@ bool zswap_load(struct folio *folio) > struct zswap_entry *entry; > struct scatterlist input, output; > struct crypto_acomp_ctx *acomp_ctx; > - u8 *src, *dst, *tmp; > + unsigned int dlen = PAGE_SIZE; > + u8 *src, *dst; > struct zpool *zpool; > - unsigned int dlen; > bool ret; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > @@ -1796,27 +1796,18 @@ bool zswap_load(struct folio *folio) > goto stats; > } > > - zpool = zswap_find_zpool(entry); > - if (!zpool_can_sleep_mapped(zpool)) { > - tmp = kmalloc(entry->length, GFP_KERNEL); > - if (!tmp) { > - ret = false; > - goto freeentry; > - } > - } > - > /* decompress */ > - dlen = PAGE_SIZE; > - src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > + mutex_lock(acomp_ctx->mutex); > > + zpool = zswap_find_zpool(entry); > + src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > if (!zpool_can_sleep_mapped(zpool)) { > - memcpy(tmp, src, entry->length); > - src = tmp; > + memcpy(acomp_ctx->dstmem, src, entry->length); > + src = acomp_ctx->dstmem; > zpool_unmap_handle(zpool, entry->handle); > } > > - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > - mutex_lock(acomp_ctx->mutex); > sg_init_one(&input, src, entry->length); > sg_init_table(&output, 1); > sg_set_page(&output, page, PAGE_SIZE, 0); > @@ -1827,15 +1818,13 @@ bool zswap_load(struct folio *folio) > > if (zpool_can_sleep_mapped(zpool)) > zpool_unmap_handle(zpool, entry->handle); > - else > - kfree(tmp); > > ret = true; > stats: > count_vm_event(ZSWPIN); > if (entry->objcg) > count_objcg_event(entry->objcg, ZSWPIN); > -freeentry: > + > spin_lock(&tree->lock); > if (ret && zswap_exclusive_loads_enabled) { > zswap_invalidate_entry(tree, entry); > > -- > b4 0.10.1