On Wed, Nov 13, 2024 at 1:30 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > The code is fine as is. > > Like you said, acomp_ctx->buffer (the pointer) doesn't change. It > points to whatever was kmalloced in zswap_cpu_comp_prepare(). The > handle points to backend memory. Neither of those addresses can change > under us. There is no confusing them, and they cannot coincide. > > The mutex guards the *memory* behind the buffer, so that we don't have > multiple (de)compressors stepping on each others' toes. But it's fine > to drop the mutex once we're done working with the memory. We don't > need the mutex to check whether src holds the acomp buffer address. I have to admit that I confused myself with this alleged bug more than I like to admit :) I initially thought acomp_ctx->buffer can be changed, then when I realized it cannot be changed I did not tie that back to the 'fix' not being needed at all. I need more coffee. > > That being said, I do think there is a UAF bug in CPU hotplugging. > > There is an acomp_ctx for each cpu, but note that this is best effort > parallelism, not a guarantee that we always have the context of the > local CPU. Look closely: we pick the "local" CPU with preemption > enabled, then contend for the mutex. This may well put us to sleep and > get us migrated, so we could be using the context of a CPU we are no > longer running on. This is fine because we hold the mutex - if that > other CPU tries to use the acomp_ctx, it'll wait for us. > > However, if we get migrated and vacate the CPU whose context we have > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > the context underneath us. I think we need to refcount the acomp_ctx.