I've been looking at the patch and it looks like it should work. Having numbers to backup the performance in the pure-software version and with HW acceleration would _very_ nice to have. On 2020-11-07 19:53:32 [+1300], Barry Song wrote: > index fbb7829..73f04de 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -415,30 +445,54 @@ static int zswap_dstmem_dead(unsigned int cpu) … > + acomp_ctx->req = req; > + > + crypto_init_wait(&acomp_ctx->wait); > + /* > + * if the backend of acomp is async zip, crypto_req_done() will wakeup > + * crypto_wait_req(); if the backend of acomp is scomp, the callback > + * won't be called, crypto_wait_req() will return without blocking. > + */ > + acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, > + crypto_req_done, &acomp_ctx->wait); > + > + acomp_ctx->mutex = per_cpu(zswap_mutex, cpu); > + acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu); You added a comment here and there you never mentioned that this single per-CPU mutex protects the per-CPU context (which you can have more than one on a single CPU) and the scratch/dstmem which is one per-CPU. Of course if you read the code you figure it out. I still think that you should have a pool of memory and crypto contexts which you can use instead of having them strictly per-CPU. The code is fully preemptible and you may have multiple requests on the same CPU. Yes, locking works but at the same you block processing while waiting on a lock and the "reserved memory" on other CPUs remains unused. Sebastian