On Mon, Jan 20, 2025 at 5:37 AM <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > The patch below does not apply to the 6.6-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to <stable@xxxxxxxxxxxxxxx>. > > To reproduce the conflict and resubmit, you may use the following commands: > > git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y > git checkout FETCH_HEAD > git cherry-pick -x 12dcb0ef540629a281533f9dedc1b6b8e14cfb65 > # <resolve conflicts, build, test, etc.> > git commit -s > git send-email --to '<stable@xxxxxxxxxxxxxxx>' --in-reply-to '2025012012-capacity-swiftly-0214@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. I sent a backport for this (and the followup fix) to v6.12. However, for v6.6 (and previous kernels), it's more complicated. It's back when we had a separate hotplug notifier to allocate the buffers (when they were shared by different acomp_ctxs), and before a lot of refactoring was done (e.g. before writeback shared code with decompression). Given that the problem has existed since v5.11 but was only reported now, and is a rare race condition with CPU hotunplug, I think it's not worth backporting the fix before v6.12 given the complexity. If anyone objects please let me know. Otherwise I won't send backports further back than v6.12. > > Possible dependencies: > > > > thanks, > > greg k-h > > ------------------ original commit in Linus's tree ------------------ > > From 12dcb0ef540629a281533f9dedc1b6b8e14cfb65 Mon Sep 17 00:00:00 2001 > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Date: Wed, 8 Jan 2025 22:24:41 +0000 > Subject: [PATCH] mm: zswap: properly synchronize freeing resources during CPU > hotunplug > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the > current CPU at the beginning of the operation is retrieved and used > throughout. However, since neither preemption nor migration are disabled, > it is possible that the operation continues on a different CPU. > > If the original CPU is hotunplugged while the acomp_ctx is still in use, > we run into a UAF bug as some of the resources attached to the acomp_ctx > are freed during hotunplug in zswap_cpu_comp_dead() (i.e. > acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp). > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use > crypto_acomp API for hardware acceleration") when the switch to the > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > retrieved using get_cpu_ptr() which disables preemption and makes sure the > CPU cannot go away from under us. Preemption cannot be disabled with the > crypto_acomp API as a sleepable context is needed. > > Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating > and freeing resources with compression/decompression paths. Make sure > that acomp_ctx.req is NULL when the resources are freed. In the > compression/decompression paths, check if acomp_ctx.req is NULL after > acquiring the mutex (meaning the CPU was offlined) and retry on the new > CPU. > > The initialization of acomp_ctx.mutex is moved from the CPU hotplug > callback to the pool initialization where it belongs (where the mutex is > allocated). In addition to adding clarity, this makes sure that CPU > hotplug cannot reinitialize a mutex that is already locked by > compression/decompression. > > Previously a fix was attempted by holding cpus_read_lock() [1]. This > would have caused a potential deadlock as it is possible for code already > holding the lock to fall into reclaim and enter zswap (causing a > deadlock). A fix was also attempted using SRCU for synchronization, but > Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug > notifiers [2]. > > Alternative fixes that were considered/attempted and could have worked: > - Refcounting the per-CPU acomp_ctx. This involves complexity in > handling the race between the refcount dropping to zero in > zswap_[de]compress() and the refcount being re-initialized when the > CPU is onlined. > - Disabling migration before getting the per-CPU acomp_ctx [3], but > that's discouraged and is a much bigger hammer than needed, and could > result in subtle performance issues. > > [1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@xxxxxxxxxx/ > [2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@xxxxxxxxxx/ > [3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@xxxxxxxxxx/ > > [yosryahmed@xxxxxxxxxx: remove comment] > Link: https://lkml.kernel.org/r/CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSManA@xxxxxxxxxxxxxx > Link: https://lkml.kernel.org/r/20250108222441.3622031-1-yosryahmed@xxxxxxxxxx > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@xxxxxxxxxxx/ > Reported-by: Sam Sun <samsun1006219@xxxxxxxxx> > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@xxxxxxxxxxxxxx/ > Cc: Barry Song <baohua@xxxxxxxxxx> > Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx> > Cc: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > Cc: Nhat Pham <nphamcs@xxxxxxxxx> > Cc: Vitaly Wool <vitalywool@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > diff --git a/mm/zswap.c b/mm/zswap.c > index f6316b66fb23..30f5a27a6862 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > struct zswap_pool *pool; > char name[38]; /* 'zswap' + 32 char (max) num + \0 */ > gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > - int ret; > + int ret, cpu; > > if (!zswap_has_pool) { > /* if either are unset, pool initialization failed, and we > @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > goto error; > } > > + for_each_possible_cpu(cpu) > + mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex); > + > ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE, > &pool->node); > if (ret) > @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) > struct acomp_req *req; > int ret; > > - mutex_init(&acomp_ctx->mutex); > - > + mutex_lock(&acomp_ctx->mutex); > acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); > - if (!acomp_ctx->buffer) > - return -ENOMEM; > + if (!acomp_ctx->buffer) { > + ret = -ENOMEM; > + goto buffer_fail; > + } > > acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu)); > if (IS_ERR(acomp)) { > @@ -855,12 +859,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) > acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, > crypto_req_done, &acomp_ctx->wait); > > + mutex_unlock(&acomp_ctx->mutex); > return 0; > > req_fail: > crypto_free_acomp(acomp_ctx->acomp); > acomp_fail: > kfree(acomp_ctx->buffer); > +buffer_fail: > + mutex_unlock(&acomp_ctx->mutex); > return ret; > } > > @@ -869,17 +876,45 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node) > struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node); > struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu); > > + mutex_lock(&acomp_ctx->mutex); > if (!IS_ERR_OR_NULL(acomp_ctx)) { > if (!IS_ERR_OR_NULL(acomp_ctx->req)) > acomp_request_free(acomp_ctx->req); > + acomp_ctx->req = NULL; > if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) > crypto_free_acomp(acomp_ctx->acomp); > kfree(acomp_ctx->buffer); > } > + mutex_unlock(&acomp_ctx->mutex); > > return 0; > } > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool) > +{ > + struct crypto_acomp_ctx *acomp_ctx; > + > + for (;;) { > + acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); > + mutex_lock(&acomp_ctx->mutex); > + if (likely(acomp_ctx->req)) > + return acomp_ctx; > + /* > + * It is possible that we were migrated to a different CPU after > + * getting the per-CPU ctx but before the mutex was acquired. If > + * the old CPU got offlined, zswap_cpu_comp_dead() could have > + * already freed ctx->req (among other things) and set it to > + * NULL. Just try again on the new CPU that we ended up on. > + */ > + mutex_unlock(&acomp_ctx->mutex); > + } > +} > + > +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx) > +{ > + mutex_unlock(&acomp_ctx->mutex); > +} > + > static bool zswap_compress(struct page *page, struct zswap_entry *entry, > struct zswap_pool *pool) > { > @@ -893,10 +928,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, > gfp_t gfp; > u8 *dst; > > - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); > - > - mutex_lock(&acomp_ctx->mutex); > - > + acomp_ctx = acomp_ctx_get_cpu_lock(pool); > dst = acomp_ctx->buffer; > sg_init_table(&input, 1); > sg_set_page(&input, page, PAGE_SIZE, 0); > @@ -949,7 +981,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, > else if (alloc_ret) > zswap_reject_alloc_fail++; > > - mutex_unlock(&acomp_ctx->mutex); > + acomp_ctx_put_unlock(acomp_ctx); > return comp_ret == 0 && alloc_ret == 0; > } > > @@ -960,9 +992,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > struct crypto_acomp_ctx *acomp_ctx; > u8 *src; > > - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > - mutex_lock(&acomp_ctx->mutex); > - > + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool); > src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > /* > * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer > @@ -986,10 +1016,10 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > - mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > + acomp_ctx_put_unlock(acomp_ctx); > } > > /********************************* >