Re: [PATCH v7] mm/zswap: move to use crypto_acomp API for hardware acceleration

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Mon, 9 Nov 2020 11:29:09 +0100

I've been looking at the patch and it looks like it should work. Having
numbers to backup the performance in the pure-software version and with
HW acceleration would _very_ nice to have.

On 2020-11-07 19:53:32 [+1300], Barry Song wrote:
> index fbb7829..73f04de 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -415,30 +445,54 @@ static int zswap_dstmem_dead(unsigned int cpu)
…
> +	acomp_ctx->req = req;
> +
> +	crypto_init_wait(&acomp_ctx->wait);
> +	/*
> +	 * if the backend of acomp is async zip, crypto_req_done() will wakeup
> +	 * crypto_wait_req(); if the backend of acomp is scomp, the callback
> +	 * won't be called, crypto_wait_req() will return without blocking.
> +	 */
> +	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &acomp_ctx->wait);
> +
> +	acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> +	acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);

You added a comment here and there you never mentioned that this single
per-CPU mutex protects the per-CPU context (which you can have more than
one on a single CPU) and the scratch/dstmem which is one per-CPU. Of
course if you read the code you figure it out.
I still think that you should have a pool of memory and crypto contexts
which you can use instead of having them strictly per-CPU. The code is
fully preemptible and you may have multiple requests on the same CPU.
Yes, locking works but at the same you block processing while waiting on
a lock and the "reserved memory" on other CPUs remains unused.

Sebastian