> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Wednesday, January 8, 2025 8:15 AM > To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; Nhat Pham > <nphamcs@xxxxxxxxx>; Chengming Zhou <chengming.zhou@xxxxxxxxx>; > Vitaly Wool <vitalywool@xxxxxxxxx>; Barry Song <baohua@xxxxxxxxxx>; Sam > Sun <samsun1006219@xxxxxxxxx>; Sridhar, Kanchana P > <kanchana.p.sridhar@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; Yosry Ahmed <yosryahmed@xxxxxxxxxx>; > stable@xxxxxxxxxxxxxxx > Subject: [PATCH] mm: zswap: properly synchronize freeing resources during > CPU hotunplug > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of > the > current CPU at the beginning of the operation is retrieved and used > throughout. However, since neither preemption nor migration are > disabled, it is possible that the operation continues on a different > CPU. > > If the original CPU is hotunplugged while the acomp_ctx is still in use, > we run into a UAF bug as some of the resources attached to the acomp_ctx > are freed during hotunplug in zswap_cpu_comp_dead(). > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to > use crypto_acomp API for hardware acceleration") when the switch to the > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > retrieved using get_cpu_ptr() which disables preemption and makes sure > the CPU cannot go away from under us. Preemption cannot be disabled > with the crypto_acomp API as a sleepable context is needed. > > During CPU hotunplug, hold the acomp_ctx.mutex before freeing any > resources, and set acomp_ctx.req to NULL when it is freed. In the > compress/decompress paths, after acquiring the acomp_ctx.mutex make sure > that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed > by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU. > > This adds proper synchronization to ensure that the acomp_ctx resources > are not freed from under compress/decompress paths. > > Note that the per-CPU acomp_ctx itself (including the mutex) is not > freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and > acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU > after it is hotunplugged. Only other fail-proofing I can think of is to initialize the mutex right after the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple it from the cpu onlining. This further clarifies the intent for this mutex to be used at the same lifetime scope as the acomp_ctx itself, independent of cpu hotplug/hotunplug. Thanks, Kanchana > > Previously a fix was attempted by holding cpus_read_lock() [1]. This > would have caused a potential deadlock as it is possible for code > already holding the lock to fall into reclaim and enter zswap (causing a > deadlock). A fix was also attempted using SRCU for synchronization, but > Johannes pointed out that synchronize_srcu() cannot be used in CPU > hotplug notifiers [2]. > > Alternative fixes that were considered/attempted and could have worked: > - Refcounting the per-CPU acomp_ctx. This involves complexity in > handling the race between the refcount dropping to zero in > zswap_[de]compress() and the refcount being re-initialized when the > CPU is onlined. > - Disabling migration before getting the per-CPU acomp_ctx [3], but > that's discouraged and is a much bigger hammer than needed, and could > result in subtle performance issues. > > [1]https://lkml.kernel.org/20241219212437.2714151-1- > yosryahmed@xxxxxxxxxx/ > [2]https://lkml.kernel.org/20250107074724.1756696-2- > yosryahmed@xxxxxxxxxx/ > [3]https://lkml.kernel.org/20250107222236.2715883-2- > yosryahmed@xxxxxxxxxx/ > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for > hardware acceleration") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Closes: > https://lore.kernel.org/lkml/20241113213007.GB1564047@xxxxxxxxxxx/ > Reported-by: Sam Sun <samsun1006219@xxxxxxxxx> > Closes: > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O > curuL4tPg6OaQ@xxxxxxxxxxxxxx/ > --- > > This applies on top of the latest mm-hotfixes-unstable on top of 'Revert > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was > dropped. > > --- > mm/zswap.c | 42 +++++++++++++++++++++++++++++++++--------- > 1 file changed, 33 insertions(+), 9 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index f6316b66fb236..4e3148050e093 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu, > struct hlist_node *node) > struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, > node); > struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool- > >acomp_ctx, cpu); > > + mutex_lock(&acomp_ctx->mutex); > if (!IS_ERR_OR_NULL(acomp_ctx)) { > if (!IS_ERR_OR_NULL(acomp_ctx->req)) > acomp_request_free(acomp_ctx->req); > + acomp_ctx->req = NULL; > if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) > crypto_free_acomp(acomp_ctx->acomp); > kfree(acomp_ctx->buffer); > } > + mutex_unlock(&acomp_ctx->mutex); > > return 0; > } > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock( > + struct crypto_acomp_ctx __percpu *acomp_ctx) > +{ > + struct crypto_acomp_ctx *ctx; > + > + for (;;) { > + ctx = raw_cpu_ptr(acomp_ctx); > + mutex_lock(&ctx->mutex); > + if (likely(ctx->req)) > + return ctx; > + /* > + * It is possible that we were migrated to a different CPU > after > + * getting the per-CPU ctx but before the mutex was > acquired. If > + * the old CPU got offlined, zswap_cpu_comp_dead() could > have > + * already freed ctx->req (among other things) and set it to > + * NULL. Just try again on the new CPU that we ended up on. > + */ > + mutex_unlock(&ctx->mutex); > + } > +} > + > +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx) > +{ > + mutex_unlock(&ctx->mutex); > +} > + > static bool zswap_compress(struct page *page, struct zswap_entry *entry, > struct zswap_pool *pool) > { > @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page, > struct zswap_entry *entry, > gfp_t gfp; > u8 *dst; > > - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); > - > - mutex_lock(&acomp_ctx->mutex); > - > + acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx); > dst = acomp_ctx->buffer; > sg_init_table(&input, 1); > sg_set_page(&input, page, PAGE_SIZE, 0); > @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct > zswap_entry *entry, > else if (alloc_ret) > zswap_reject_alloc_fail++; > > - mutex_unlock(&acomp_ctx->mutex); > + acomp_ctx_put_unlock(acomp_ctx); > return comp_ret == 0 && alloc_ret == 0; > } > > @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry > *entry, struct folio *folio) > struct crypto_acomp_ctx *acomp_ctx; > u8 *src; > > - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > - mutex_lock(&acomp_ctx->mutex); > - > + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx); > src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > /* > * If zpool_map_handle is atomic, we cannot reliably utilize its > mapped buffer > @@ -986,10 +1010,10 @@ static void zswap_decompress(struct > zswap_entry *entry, struct folio *folio) > acomp_request_set_params(acomp_ctx->req, &input, &output, > entry->length, PAGE_SIZE); > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > >req), &acomp_ctx->wait)); > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > - mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > + acomp_ctx_put_unlock(acomp_ctx); > } > > /********************************* > -- > 2.47.1.613.gc27f4b7a9f-goog