> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Wednesday, November 13, 2024 12:11 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > Hi Yosry, > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > Sent: Tuesday, November 12, 2024 10:22 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; > chengming.zhou@xxxxxxxxx; > > > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > > > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > > <vinodh.gopal@xxxxxxxxx> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > > > > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; > > > chengming.zhou@xxxxxxxxx; > > > > > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > > > > > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > > > foundation.org; > > > > > Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > > > > <vinodh.gopal@xxxxxxxxx> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > > > the existing zswap_decompress(): > > > > > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > > > Releasing the lock before the conditional does not protect the > integrity > > > of > > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This > poses a > > > > > > risk for the conditional behaving as intended, and consequently not > > > > > > unmapping the zpool handle, which could cause a zswap zpool > memory > > > > > leak. > > > > > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional > and > > > > > > subsequent zpool_unmap_handle(). This ensures that the value of > "src" > > > > > > obtained earlier, with the mutex locked, does not change. > > > > > > > > > > The commit log is too complicated and incorrect. It is talking about > > > > > the stability of 'src', but that's a local variable on the stack > > > > > anyway. It doesn't need protection. > > > > > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after > the > > > > > mutex is released. Leading to the check not being reliable. Please > > > > > simplify this. > > > > > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > > > value changing after the lock is released. This could happen as a result > of > > > any > > > > other compress or decompress that acquires the lock. I will simplify and > > > > clarify accordingly. > > > > > > > > > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > > > like a > > > > > > cleaner implementation. > > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > > > --- > > > > > > mm/zswap.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > > > --- a/mm/zswap.c > > > > > > +++ b/mm/zswap.c > > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > > > zswap_entry *entry, struct folio *folio) > > > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > > entry- > > > > > >length, PAGE_SIZE); > > > > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > > > >req), &acomp_ctx->wait)); > > > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > Actually now that I think more about it, I think this check isn't > > > > > entirely safe, even under the lock. Is it possible that > > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > > > decompression at the same handle? In this case, we will also > > > > > mistakenly skip the unmap. > > > > > > > > If we move the mutex_unlock to happen after the conditional and > unmap, > > > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > > > take on only 2 values in this procedure: the mapped handle or > > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > > > > > Yes, that's the scenario I mean. > > > > > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > > > happens to be equal to the same handle from a previous operation as we > > > don't clear it. > > > > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), > > we only copy to it. > > Duh, yes. I confused myself, sorry for the noise. > > > > > > > > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > > > though it should be true. This will result in an extra > > > zpool_unmap_handle() call. I didn't look closely, but this seems like > > > it can have a worse effect than leaking memory (e.g. there will be an > > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > > > a random handle). > > > > > > > > > > > Please let me know if I am missing anything. > > > > > > > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > > > acomp_ctx->buffer and do the unmap, and check that flag here to > check > > > > > if the unmap was done or not. > > > > > > > > Sure, this could be done, but not sure if it is required. Please let me know > > > > if we still need the boolean variable in addition to moving the > > > mutex_unlock(). > > > > > > If we use a boolean, there is no need to move mutex_unlock(). The > > > boolean will be a local variable on the stack that doesn't need > > > protection. > > > > I agree, using the boolean variable to do the unmap rather than the check > > for (src != acomp_ctx->buffer) is more fail-safe. > > > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > Why would this result in an undefined state? For zsmalloc, mapping > uses a per-CPU buffer and preemption is disabled between mapping and > unmapping anyway. If another CPU maps the object it should be fine. > > > > > Would keeping the per-cpu mutex locked through the > > zpool_unmap_handle() create a non-preemptible state that would > > avoid this? IOW, if the above scenario is possible, does the per-cpu > > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow > > prevented by the implementation of the zpools? > > At least for zsmalloc, I think it is. > > > > > Just thought I would bring up these open questions. Please do share > > your thoughts and advise. > > I think moving the mutex unlock after the unmap won't make much of a > difference from a performance side, at least for zsmalloc. Preemption > will be disabled until the unmap is done anyway, so even after we > release the per-CPU mutex it cannot be acquired by anyone else until > the unmap is done. > > Anyway, I think the fix you have right now is fine, if you prefer not > adding a boolean. If you do add a boolean, whether you move the mutex > unlock or not should not make a difference. Thanks for the guidance on next steps! I will add the boolean and move the mutex. > > Please just rewrite the commit log and CC stable (in the commit log, > not in the email CC list). I guess this refers to adding "Cc: stable@xxxxxxxxxxxxxxx" in the commit log (as found from the latest mainline git log) ? Thanks again, Kanchana