Re: FAILED: patch "[PATCH] mm: zswap: properly synchronize freeing resources during CPU" failed to apply to 6.6-stable tree

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 21 Jan 2025 10:20:42 -0800

On Mon, Jan 20, 2025 at 5:37 AM <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
>
> The patch below does not apply to the 6.6-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@xxxxxxxxxxxxxxx>.
>
> To reproduce the conflict and resubmit, you may use the following commands:
>
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
> git checkout FETCH_HEAD
> git cherry-pick -x 12dcb0ef540629a281533f9dedc1b6b8e14cfb65
> # <resolve conflicts, build, test, etc.>
> git commit -s
> git send-email --to '<stable@xxxxxxxxxxxxxxx>' --in-reply-to '2025012012-capacity-swiftly-0214@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..

I sent a backport for this (and the followup fix) to v6.12. However,
for v6.6 (and previous kernels), it's more complicated. It's back when
we had a separate hotplug notifier to allocate the buffers (when they
were shared by different acomp_ctxs), and before a lot of refactoring
was done (e.g. before writeback shared code with decompression).

Given that the problem has existed since v5.11 but was only reported
now, and is a rare race condition with CPU hotunplug, I think it's not
worth backporting the fix before v6.12 given the complexity.

If anyone objects please let me know. Otherwise I won't send backports
further back than v6.12.

>
> Possible dependencies:
>
>
>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From 12dcb0ef540629a281533f9dedc1b6b8e14cfb65 Mon Sep 17 00:00:00 2001
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Date: Wed, 8 Jan 2025 22:24:41 +0000
> Subject: [PATCH] mm: zswap: properly synchronize freeing resources during CPU
>  hotunplug
>
> In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
> current CPU at the beginning of the operation is retrieved and used
> throughout.  However, since neither preemption nor migration are disabled,
> it is possible that the operation continues on a different CPU.
>
> If the original CPU is hotunplugged while the acomp_ctx is still in use,
> we run into a UAF bug as some of the resources attached to the acomp_ctx
> are freed during hotunplug in zswap_cpu_comp_dead() (i.e.
> acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp).
>
> The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
> crypto_acomp API for hardware acceleration") when the switch to the
> crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
> retrieved using get_cpu_ptr() which disables preemption and makes sure the
> CPU cannot go away from under us.  Preemption cannot be disabled with the
> crypto_acomp API as a sleepable context is needed.
>
> Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating
> and freeing resources with compression/decompression paths.  Make sure
> that acomp_ctx.req is NULL when the resources are freed.  In the
> compression/decompression paths, check if acomp_ctx.req is NULL after
> acquiring the mutex (meaning the CPU was offlined) and retry on the new
> CPU.
>
> The initialization of acomp_ctx.mutex is moved from the CPU hotplug
> callback to the pool initialization where it belongs (where the mutex is
> allocated).  In addition to adding clarity, this makes sure that CPU
> hotplug cannot reinitialize a mutex that is already locked by
> compression/decompression.
>
> Previously a fix was attempted by holding cpus_read_lock() [1].  This
> would have caused a potential deadlock as it is possible for code already
> holding the lock to fall into reclaim and enter zswap (causing a
> deadlock).  A fix was also attempted using SRCU for synchronization, but
> Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug
> notifiers [2].
>
> Alternative fixes that were considered/attempted and could have worked:
> - Refcounting the per-CPU acomp_ctx. This involves complexity in
>   handling the race between the refcount dropping to zero in
>   zswap_[de]compress() and the refcount being re-initialized when the
>   CPU is onlined.
> - Disabling migration before getting the per-CPU acomp_ctx [3], but
>   that's discouraged and is a much bigger hammer than needed, and could
>   result in subtle performance issues.
>
> [1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@xxxxxxxxxx/
> [2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@xxxxxxxxxx/
> [3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@xxxxxxxxxx/
>
> [yosryahmed@xxxxxxxxxx: remove comment]
>   Link: https://lkml.kernel.org/r/CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSManA@xxxxxxxxxxxxxx
> Link: https://lkml.kernel.org/r/20250108222441.3622031-1-yosryahmed@xxxxxxxxxx
> Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
> Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@xxxxxxxxxxx/
> Reported-by: Sam Sun <samsun1006219@xxxxxxxxx>
> Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@xxxxxxxxxxxxxx/
> Cc: Barry Song <baohua@xxxxxxxxxx>
> Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>
> Cc: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx>
> Cc: Nhat Pham <nphamcs@xxxxxxxxx>
> Cc: Vitaly Wool <vitalywool@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb23..30f5a27a6862 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
>         struct zswap_pool *pool;
>         char name[38]; /* 'zswap' + 32 char (max) num + \0 */
>         gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
> -       int ret;
> +       int ret, cpu;
>
>         if (!zswap_has_pool) {
>                 /* if either are unset, pool initialization failed, and we
> @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
>                 goto error;
>         }
>
> +       for_each_possible_cpu(cpu)
> +               mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
> +
>         ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
>                                        &pool->node);
>         if (ret)
> @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         struct acomp_req *req;
>         int ret;
>
> -       mutex_init(&acomp_ctx->mutex);
> -
> +       mutex_lock(&acomp_ctx->mutex);
>         acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
> -       if (!acomp_ctx->buffer)
> -               return -ENOMEM;
> +       if (!acomp_ctx->buffer) {
> +               ret = -ENOMEM;
> +               goto buffer_fail;
> +       }
>
>         acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
>         if (IS_ERR(acomp)) {
> @@ -855,12 +859,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
>                                    crypto_req_done, &acomp_ctx->wait);
>
> +       mutex_unlock(&acomp_ctx->mutex);
>         return 0;
>
>  req_fail:
>         crypto_free_acomp(acomp_ctx->acomp);
>  acomp_fail:
>         kfree(acomp_ctx->buffer);
> +buffer_fail:
> +       mutex_unlock(&acomp_ctx->mutex);
>         return ret;
>  }
>
> @@ -869,17 +876,45 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
>         struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
>         struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
>
> +       mutex_lock(&acomp_ctx->mutex);
>         if (!IS_ERR_OR_NULL(acomp_ctx)) {
>                 if (!IS_ERR_OR_NULL(acomp_ctx->req))
>                         acomp_request_free(acomp_ctx->req);
> +               acomp_ctx->req = NULL;
>                 if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
>                         crypto_free_acomp(acomp_ctx->acomp);
>                 kfree(acomp_ctx->buffer);
>         }
> +       mutex_unlock(&acomp_ctx->mutex);
>
>         return 0;
>  }
>
> +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool)
> +{
> +       struct crypto_acomp_ctx *acomp_ctx;
> +
> +       for (;;) {
> +               acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> +               mutex_lock(&acomp_ctx->mutex);
> +               if (likely(acomp_ctx->req))
> +                       return acomp_ctx;
> +               /*
> +                * It is possible that we were migrated to a different CPU after
> +                * getting the per-CPU ctx but before the mutex was acquired. If
> +                * the old CPU got offlined, zswap_cpu_comp_dead() could have
> +                * already freed ctx->req (among other things) and set it to
> +                * NULL. Just try again on the new CPU that we ended up on.
> +                */
> +               mutex_unlock(&acomp_ctx->mutex);
> +       }
> +}
> +
> +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
> +{
> +       mutex_unlock(&acomp_ctx->mutex);
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                            struct zswap_pool *pool)
>  {
> @@ -893,10 +928,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         gfp_t gfp;
>         u8 *dst;
>
> -       acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> -
> -       mutex_lock(&acomp_ctx->mutex);
> -
> +       acomp_ctx = acomp_ctx_get_cpu_lock(pool);
>         dst = acomp_ctx->buffer;
>         sg_init_table(&input, 1);
>         sg_set_page(&input, page, PAGE_SIZE, 0);
> @@ -949,7 +981,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         else if (alloc_ret)
>                 zswap_reject_alloc_fail++;
>
> -       mutex_unlock(&acomp_ctx->mutex);
> +       acomp_ctx_put_unlock(acomp_ctx);
>         return comp_ret == 0 && alloc_ret == 0;
>  }
>
> @@ -960,9 +992,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         struct crypto_acomp_ctx *acomp_ctx;
>         u8 *src;
>
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(&acomp_ctx->mutex);
> -
> +       acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
>         src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>         /*
>          * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer
> @@ -986,10 +1016,10 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
>         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
>         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> -       mutex_unlock(&acomp_ctx->mutex);
>
>         if (src != acomp_ctx->buffer)
>                 zpool_unmap_handle(zpool, entry->handle);
> +       acomp_ctx_put_unlock(acomp_ctx);
>  }
>
>  /*********************************
>