On Wed, Jan 8, 2025 at 4:12 PM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Sent: Wednesday, January 8, 2025 2:25 PM > > To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; Nhat Pham > > <nphamcs@xxxxxxxxx>; Chengming Zhou <chengming.zhou@xxxxxxxxx>; > > Vitaly Wool <vitalywool@xxxxxxxxx>; Barry Song <baohua@xxxxxxxxxx>; Sam > > Sun <samsun1006219@xxxxxxxxx>; Sridhar, Kanchana P > > <kanchana.p.sridhar@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux- > > kernel@xxxxxxxxxxxxxxx; Yosry Ahmed <yosryahmed@xxxxxxxxxx>; > > stable@xxxxxxxxxxxxxxx > > Subject: [PATCH v2] mm: zswap: properly synchronize freeing resources > > during CPU hotunplug > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of > > the > > current CPU at the beginning of the operation is retrieved and used > > throughout. However, since neither preemption nor migration are > > disabled, it is possible that the operation continues on a different > > CPU. > > > > If the original CPU is hotunplugged while the acomp_ctx is still in use, > > we run into a UAF bug as some of the resources attached to the acomp_ctx > > are freed during hotunplug in zswap_cpu_comp_dead() (i.e. > > acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp). > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to > > use crypto_acomp API for hardware acceleration") when the switch to the > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > > retrieved using get_cpu_ptr() which disables preemption and makes sure > > the CPU cannot go away from under us. Preemption cannot be disabled > > with the crypto_acomp API as a sleepable context is needed. > > > > Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating > > and freeing resources with compression/decompression paths. Make sure > > that acomp_ctx.req is NULL when the resources are freed. In the > > compression/decompression paths, check if acomp_ctx.req is NULL after > > acquiring the mutex (meaning the CPU was offlined) and retry on the new > > CPU. > > > > The initialization of acomp_ctx.mutex is moved from the CPU hotplug > > callback to the pool initialization where it belongs (where the mutex is > > allocated). In addition to adding clarity, this makes sure that CPU > > hotplug cannot reinitialize a mutex that is already locked by > > compression/decompression. > > > > Previously a fix was attempted by holding cpus_read_lock() [1]. This > > would have caused a potential deadlock as it is possible for code > > already holding the lock to fall into reclaim and enter zswap (causing a > > deadlock). A fix was also attempted using SRCU for synchronization, but > > Johannes pointed out that synchronize_srcu() cannot be used in CPU > > hotplug notifiers [2]. > > > > Alternative fixes that were considered/attempted and could have worked: > > - Refcounting the per-CPU acomp_ctx. This involves complexity in > > handling the race between the refcount dropping to zero in > > zswap_[de]compress() and the refcount being re-initialized when the > > CPU is onlined. > > - Disabling migration before getting the per-CPU acomp_ctx [3], but > > that's discouraged and is a much bigger hammer than needed, and could > > result in subtle performance issues. > > > > [1]https://lkml.kernel.org/20241219212437.2714151-1- > > yosryahmed@xxxxxxxxxx/ > > [2]https://lkml.kernel.org/20250107074724.1756696-2- > > yosryahmed@xxxxxxxxxx/ > > [3]https://lkml.kernel.org/20250107222236.2715883-2- > > yosryahmed@xxxxxxxxxx/ > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for > > hardware acceleration") > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > Closes: > > https://lore.kernel.org/lkml/20241113213007.GB1564047@xxxxxxxxxxx/ > > Reported-by: Sam Sun <samsun1006219@xxxxxxxxx> > > Closes: > > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O > > curuL4tPg6OaQ@xxxxxxxxxxxxxx/ > > --- > > > > This applies on top of the latest mm-hotfixes-unstable on top of 'Revert > > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and > > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was > > dropped. > > > > v1 -> v2: > > - Move the initialization of the mutex to pool initialization. > > - Use the mutex to also synchronize with the CPU hotplug callback (i.e. > > zswap_cpu_comp_prep()). > > - Naming cleanups. > > > > --- > > mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++--------- > > ---- > > 1 file changed, 46 insertions(+), 14 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb236..4d7e564732267 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char > > *type, char *compressor) > > struct zswap_pool *pool; > > char name[38]; /* 'zswap' + 32 char (max) num + \0 */ > > gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | > > __GFP_KSWAPD_RECLAIM; > > - int ret; > > + int ret, cpu; > > > > if (!zswap_has_pool) { > > /* if either are unset, pool initialization failed, and we > > @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char > > *type, char *compressor) > > goto error; > > } > > > > + for_each_possible_cpu(cpu) > > + mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex); > > + > > ret = > > cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE, > > &pool->node); > > if (ret) > > @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int > > cpu, struct hlist_node *node) > > struct acomp_req *req; > > int ret; > > > > - mutex_init(&acomp_ctx->mutex); > > - > > + mutex_lock(&acomp_ctx->mutex); > > acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, > > cpu_to_node(cpu)); > > - if (!acomp_ctx->buffer) > > - return -ENOMEM; > > + if (!acomp_ctx->buffer) { > > + ret = -ENOMEM; > > + goto buffer_fail; > > + } > > > > acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, > > cpu_to_node(cpu)); > > if (IS_ERR(acomp)) { > > @@ -844,6 +848,8 @@ static int zswap_cpu_comp_prepare(unsigned int > > cpu, struct hlist_node *node) > > ret = -ENOMEM; > > goto req_fail; > > } > > + > > + /* acomp_ctx->req must be NULL if the acomp_ctx is not fully > > initialized */ > > acomp_ctx->req = req; > > For this to happen, shouldn't we directly assign: > acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp); > if (!acomp_ctx->req) { ...} For the initial zswap_cpu_comp_prepare() call on a CPU, it doesn't matter because a failure will cause a failure of initialization or CPU onlining. For calls after a CPU is offlined, zswap_cpu_comp_dead() will have set acomp_ctx->req to NULL, so leaving it unmodified keeps it as NULL. Although I suppose the comment is not really clear because zswap_cpu_comp_prepare() could fail before acomp_request_alloc() is ever called and this would not be a problem (as the onlining will fail). Maybe it's best to just drop the comment. Andrew, could you please fold this in? diff --git a/mm/zswap.c b/mm/zswap.c index 4d7e564732267..30f5a27a68620 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -848,8 +848,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) ret = -ENOMEM; goto req_fail; } - - /* acomp_ctx->req must be NULL if the acomp_ctx is not fully initialized */ acomp_ctx->req = req; crypto_init_wait(&acomp_ctx->wait); > > I was wondering how error conditions encountered in zswap_cpu_comp_prepare() > will impact zswap_[de]compress(). This is probably unrelated to this patch itself, > but is my understanding correct that an error in this procedure will cause > zswap_enabled to be set to false, which will cause any zswap_stores() to fail early? Yeah that is my understanding, for the initial call. For later calls when a CPU gets onlined I think the CPU onlining fails and the acomp_ctx is not used.