On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Sent: Tuesday, November 19, 2024 11:51 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>; Johannes Weiner > > <hannes@xxxxxxxxxxx>; Nhat Pham <nphamcs@xxxxxxxxx>; linux- > > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > > Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > <vinodh.gopal@xxxxxxxxx> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > > Sent: Tuesday, November 19, 2024 11:27 AM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > > Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>; Johannes Weiner > > > > <hannes@xxxxxxxxxxx>; Nhat Pham <nphamcs@xxxxxxxxx>; linux- > > > > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > > > ryan.roberts@xxxxxxx; Huang, Ying <ying.huang@xxxxxxxxx>; > > > > 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, Wajdi K > > > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > > > > Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>; Johannes Weiner > > > > > > <hannes@xxxxxxxxxxx>; Nhat Pham <nphamcs@xxxxxxxxx>; linux- > > > > > > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > usamaarif642@xxxxxxxxx; > > > > > > ryan.roberts@xxxxxxx; Huang, Ying <ying.huang@xxxxxxxxx>; > > > > > > 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, Wajdi K > > > > > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > <vinodh.gopal@xxxxxxxxx> > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > zswap_decompress(). > > > > > > > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Hi Chengming, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Chengming Zhou <chengming.zhou@xxxxxxxxx> > > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; > > Johannes > > > > > > Weiner > > > > > > > > <hannes@xxxxxxxxxxx> > > > > > > > > Cc: Nhat Pham <nphamcs@xxxxxxxxx>; Yosry Ahmed > > > > > > > > <yosryahmed@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux- > > > > > > > > mm@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > ryan.roberts@xxxxxxx; > > > > > > Huang, > > > > > > > > Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; > > Gopal, > > > > > > Vinodh > > > > > > > > <vinodh.gopal@xxxxxxxxx> > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > > >> From: Johannes Weiner <hannes@xxxxxxxxxxx> > > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > > > > > > >> Cc: Nhat Pham <nphamcs@xxxxxxxxx>; Yosry Ahmed > > > > > > > > >> <yosryahmed@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; > > linux- > > > > > > > > >> mm@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > > > > > > usamaarif642@xxxxxxxxx; > > > > > > > > >> ryan.roberts@xxxxxxx; Huang, Ying <ying.huang@xxxxxxxxx>; > > > > > > > > >> 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, > > Wajdi K > > > > > > > > >> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh > > > > <vinodh.gopal@xxxxxxxxx> > > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory > > leak in > > > > > > > > >> zswap_decompress(). > > > > > > > > >> > > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana > > P > > > > > > wrote: > > > > > > > > >>> So my question was, can we prevent the migration to a > > different > > > > cpu > > > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > > > >> > > > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a > > per- > > > > cpu > > > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to > > protect > > > > > > > > this section. > > > > > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior > > to > > > > > > > > > the migration (in the UAF scenario you described) from being > > > > deleted? > > > > > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to > > prevent > > > > the > > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx > > from > > > > being > > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use > > the > > > > > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu > > offline > > > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > I see. How would the refcounts work? Would this add latency to > > zswap > > > > > > > ops? In low memory situations, could the cpu offlining code over-ride > > > > > > > the refcounts? > > > > > > > > > > > > I think what Johannes meant is that the zswap compress/decompress > > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > > > offlining code only drops the initial ref, and does not free the > > > > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > > > > > wrapper, put_acompt_ctx() will be used after we are done using the > > > > > > acomp_ctx. > > > > > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > > > zswap_cpu_comp_dead() and if this returns true, to exit without > > deleting > > > > > the acomp? > > > > > > > > I don't think this works. First of all, it's racy. It's possible the > > > > mutex gets locked after we check mutex_is_locked() but before we > > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then > > > > we do nothing and essentially leak the memory. > > > > > > Yes, this would assume the cpu offlining code retries at some interval, > > > which could prevent the memory leak. > > > > I am not sure about that, but even so, it wouldn't handle the first > > scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > > > > > > > > > Second, and probably more important, this only checks if anyone is > > > > currently holding the mutex. What about tasks that may be sleeping > > > > waiting for the mutex to be unlocked? The mutex will be deleted from > > > > under them as well. > > > > > > Wouldn't this and the race described above, also be issues for the > > > refcount based approach? > > > > I don't think so, at least if implemented correctly. There are a lot > > of examples around the kernel that use RCU + refcounts for such use > > cases. I think there are also some examples in kernel docs. > > > > That being said, I am wondering if we can get away with something > > simpler like holding the cpus read lock or disabling migration as I > > suggested earlier, but I am not quite sure. > > Another idea to consider is how zsmalloc avoids this issue through > its use of the local_lock() on the per-cpu mapping area. This disables > preemption from zs_map_object() through zs_unmap_object(). > Would changing the acomp_ctx's mutex to a local_lock solve the > problem? This is similar to disabling migration as I suggested, but disabling preemption means that we cannot sleep, we spin on a lock instead. In zswap_compress(), we send the compression request and may sleep waiting for it. We also make a non-atomic allocation later that may also sleep but that's less of a problem. In zswap_decompress() we may also sleep, which is why we sometimes copy the data into acomp_ctx->buffer and unmap the handle to begin with. So I don't think we can just replace the mutex with a lock. > > > > > > > > > Also, I am wondering if the mutex design already handles cases where > > > tasks are sleeping, waiting for a mutex that disappears? > > > > I don't believe so. It doesn't make sense for someone to free a mutex > > while someone is waiting for it. How would the waiter know if the > > memory backing the mutex was freed? > > Thanks Yosry, all good points. There would need to be some sort of > arbiter (for e.g., the cpu offlining code) that would reschedule tasks > running on a cpu before shutting it down, which could address > this specific issue. I was thinking these are not problems unique to > zswap's per-cpu acomp_ctx->mutex wrt the offlining? There are a few reasons why zswap has this problem and other code may not have it. For example the data structure is dynamically allocated and is freed during offlining, it wouldn't be a problem if it was static. Also the fact that we don't disable preemption when accessing the per-CPU data, as I mentioned earlier, which would prevent the CPU we are running on from going offline while we access the per-CPU data. I think we should either: (a) Use refcounts. (b) Disable migration. (c) Hold the CPUs read lock. I was hoping someone with more knowledge about CPU offlining would confirm (b) and (c) would work, but I am pretty confident they would.