On Wed, Dec 27, 2023 at 7:32 PM Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> wrote: > > On 2023/12/27 09:24, Barry Song wrote: > > On Wed, Dec 27, 2023 at 4:56 AM Chengming Zhou > > <zhouchengming@xxxxxxxxxxxxx> wrote: > >> > >> In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first > >> copy the entry->handle memory to a temporary memory, which is allocated > >> using kmalloc. > >> > >> Obviously we can reuse the per-compressor dstmem to avoid allocating > >> every time, since it's percpu-compressor and protected in percpu mutex. > > > > what is the benefit of this since we are actually increasing lock contention > > by reusing this buffer between multiple compression and decompression > > threads? > > This mutex is already reused in all compress/decompress paths even before > the reuse optimization. I think the best way maybe to use separate crypto_acomp > for compression and decompression. > > Do you think the lock contention will be increased because we now put zpool_map_handle() > and memcpy() in the lock section? Actually, we can move zpool_map_handle() before > the lock section if needed, but that memcpy() should be protected in lock section. > > > > > this mainly affects zsmalloc which can't sleep? do we have performance > > data? > > Right, last time when test I remembered there is very minor performance difference. > The main benefit here is to simply the code much and delete one failure case. ok. For the majority of hardware, people are using CPU-based compression/decompression, there is no chance they will sleep. Thus, all compression/decompression can be done in a zpool_map section, there is *NO* need to copy at all! Only for those hardware which can provide a HW-accelerator to offload CPU, crypto will actually wait for completion by static inline int crypto_wait_req(int err, struct crypto_wait *wait) { switch (err) { case -EINPROGRESS: case -EBUSY: wait_for_completion(&wait->completion); reinit_completion(&wait->completion); err = wait->err; break; } return err; } for CPU-based alg, we have completed the compr/decompr within crypto_acomp_decompress() synchronously. they won't return EINPROGRESS, EBUSY. The problem is that crypto_acomp won't expose this information to its users. if it does, we can use this info, we will totally avoid the code of copying zsmalloc's data to a tmp buffer for the most majority users of zswap. But I am not sure if we can find a way to convince Herbert(+To) :-) > > > > > and it seems this patch is also negatively affecting z3fold and zbud.c > > which actually don't need to allocate a tmp buffer. > > > > As for z3fold and zbud, the influence should be much less since the only difference > here is zpool_map_handle() moved in lock section, which could be moved out if needed > as noted above. And also no evident performance regression in the testing. > > Thanks. > > >> > >> Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx> > >> Reviewed-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > >> Acked-by: Chris Li <chrisl@xxxxxxxxxx> (Google) > >> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > >> --- > >> mm/zswap.c | 44 ++++++++++++-------------------------------- > >> 1 file changed, 12 insertions(+), 32 deletions(-) > >> > >> diff --git a/mm/zswap.c b/mm/zswap.c > >> index 976f278aa507..6b872744e962 100644 > >> --- a/mm/zswap.c > >> +++ b/mm/zswap.c > >> @@ -1417,19 +1417,13 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > >> struct crypto_acomp_ctx *acomp_ctx; > >> struct zpool *pool = zswap_find_zpool(entry); > >> bool page_was_allocated; > >> - u8 *src, *tmp = NULL; > >> + u8 *src; > >> unsigned int dlen; > >> int ret; > >> struct writeback_control wbc = { > >> .sync_mode = WB_SYNC_NONE, > >> }; > >> > >> - if (!zpool_can_sleep_mapped(pool)) { > >> - tmp = kmalloc(PAGE_SIZE, GFP_KERNEL); > >> - if (!tmp) > >> - return -ENOMEM; > >> - } > >> - > >> /* try to allocate swap cache page */ > >> mpol = get_task_policy(current); > >> page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, > >> @@ -1465,15 +1459,15 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > >> /* decompress */ > >> acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > >> dlen = PAGE_SIZE; > >> + mutex_lock(acomp_ctx->mutex); > >> > >> src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > >> if (!zpool_can_sleep_mapped(pool)) { > >> - memcpy(tmp, src, entry->length); > >> - src = tmp; > >> + memcpy(acomp_ctx->dstmem, src, entry->length); > >> + src = acomp_ctx->dstmem; > >> zpool_unmap_handle(pool, entry->handle); > >> } > >> > >> - mutex_lock(acomp_ctx->mutex); > >> sg_init_one(&input, src, entry->length); > >> sg_init_table(&output, 1); > >> sg_set_page(&output, page, PAGE_SIZE, 0); > >> @@ -1482,9 +1476,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > >> dlen = acomp_ctx->req->dlen; > >> mutex_unlock(acomp_ctx->mutex); > >> > >> - if (!zpool_can_sleep_mapped(pool)) > >> - kfree(tmp); > >> - else > >> + if (zpool_can_sleep_mapped(pool)) > >> zpool_unmap_handle(pool, entry->handle); > >> > >> BUG_ON(ret); > >> @@ -1508,9 +1500,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > >> return ret; > >> > >> fail: > >> - if (!zpool_can_sleep_mapped(pool)) > >> - kfree(tmp); > >> - > >> /* > >> * If we get here because the page is already in swapcache, a > >> * load may be happening concurrently. It is safe and okay to > >> @@ -1771,7 +1760,7 @@ bool zswap_load(struct folio *folio) > >> struct zswap_entry *entry; > >> struct scatterlist input, output; > >> struct crypto_acomp_ctx *acomp_ctx; > >> - u8 *src, *dst, *tmp; > >> + u8 *src, *dst; > >> struct zpool *zpool; > >> unsigned int dlen; > >> bool ret; > >> @@ -1796,26 +1785,19 @@ bool zswap_load(struct folio *folio) > >> } > >> > >> zpool = zswap_find_zpool(entry); > >> - if (!zpool_can_sleep_mapped(zpool)) { > >> - tmp = kmalloc(entry->length, GFP_KERNEL); > >> - if (!tmp) { > >> - ret = false; > >> - goto freeentry; > >> - } > >> - } > >> > >> /* decompress */ > >> dlen = PAGE_SIZE; > >> - src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > >> + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > >> + mutex_lock(acomp_ctx->mutex); > >> > >> + src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > >> if (!zpool_can_sleep_mapped(zpool)) { > >> - memcpy(tmp, src, entry->length); > >> - src = tmp; > >> + memcpy(acomp_ctx->dstmem, src, entry->length); > >> + src = acomp_ctx->dstmem; > >> zpool_unmap_handle(zpool, entry->handle); > >> } > >> > >> - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > >> - mutex_lock(acomp_ctx->mutex); > >> sg_init_one(&input, src, entry->length); > >> sg_init_table(&output, 1); > >> sg_set_page(&output, page, PAGE_SIZE, 0); > >> @@ -1826,15 +1808,13 @@ bool zswap_load(struct folio *folio) > >> > >> if (zpool_can_sleep_mapped(zpool)) > >> zpool_unmap_handle(zpool, entry->handle); > >> - else > >> - kfree(tmp); > >> > >> ret = true; > >> stats: > >> count_vm_event(ZSWPIN); > >> if (entry->objcg) > >> count_objcg_event(entry->objcg, ZSWPIN); > >> -freeentry: > >> + > >> spin_lock(&tree->lock); > >> if (ret && zswap_exclusive_loads_enabled) { > >> zswap_invalidate_entry(tree, entry); > >> > >> -- > >> b4 0.10.1 > >> Thanks Barry