On 2025/2/27 07:47, Nhat Pham wrote:
On Wed, Feb 26, 2025 at 1:23 PM Yosry Ahmed <yosry.ahmed@xxxxxxxxx> wrote:
On Wed, Feb 26, 2025 at 09:16:28PM +0000, Eric Biggers wrote:
On Wed, Feb 26, 2025 at 08:32:22PM +0000, Yosry Ahmed wrote:
On Wed, Feb 26, 2025 at 08:00:16PM +0000, Eric Biggers wrote:
On Wed, Feb 26, 2025 at 06:56:25PM +0000, Yosry Ahmed wrote:
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex. crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).
On the other hand, crypto_alloc_acomp_node() holds the scomp_lock
(through crypto_scomp_init_tfm()), and then allocates memory.
If the allocation results in reclaim, we may attempt to hold the per-CPU
acomp_ctx mutex.
The bug is in acomp. crypto_free_acomp() should never have to wait for a memory
allocation. That is what needs to be fixed.
crypto_free_acomp() does not explicitly wait for an allocation, but it
waits for scomp_lock (in crypto_exit_scomp_ops_async()), which may be
held while allocating memory from crypto_scomp_init_tfm().
Are you suggesting that crypto_exit_scomp_ops_async() should not be
holding scomp_lock?
I think the solution while keeping the bounce buffer in place would be to do
what the patch
https://lore.kernel.org/linux-crypto/Z6w7Pz8jBeqhijut@xxxxxxxxxxxxxxxxxxx/ does,
i.e. make the actual allocation and free happen outside the lock.
I am fine with a solution like that if Herbert is fine with it. Although
as I mentioned, I think this patch is nice to have anyway.
But really the bounce buffering in acomp (which is what is causing this problem)
should not exist at all. There is really no practical use case for it; it's
just there because of the Crypto API's insistence on shoehorning everything into
scatterlists for no reason...
I am assuming this about scomp_scratch logic, which is what we need to
hold the scomp_lock for, resulting in this problem.
Yes.
If this is something that can be done right away I am fine with dropping
this patch for an alternative fix, although it may be nice to reduce the
lock critical section in zswap_cpu_comp_dead() to the bare minimum
anyway.
Well, unfortunately the whole Crypto API philosophy of having a single interface
for software and for hardware offload doesn't really work. This is just yet
another example of that; it's a problem caused by shoehorning software
compression into an interface designed for hardware offload. zcomp really
should just use the compression libs directly (like most users of compression in
the kernel already do), and have an alternate code path specifically for
hardware offload (using acomp) for the few people who really want that.
zcomp is for zram, zswap does not use it. If zswap is not going to use
the crypto API we'll want something like zcomp or maybe reuse zcomp
itself. That's a problem for another day :)
I'm actually thinking whether we should expose the zcomp API and use
it for zswap. There are a couple of parameters for zstd I wanna play
with, which zcomp/zram seems to already support, but not the crypto
API (zstd level, dictionary, etc.).
Ah, agree! Actually I also think we should use the zcomp API in zswap,
if its API meets our requirements.
But yes, a different problem for another day :)