Re: [BUG] ZSwap leaks memory upon being disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Oct 26, 2024 at 4:33 AM Konstantin Kharlamov <Hi-Angel@xxxxxxxxx> wrote:
>
> On Fri, 2024-10-25 at 00:50 -0700, Yosry Ahmed wrote:
> > On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov
> > <Hi-Angel@xxxxxxxxx> wrote:
> > >
> > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote:
> > > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov
> > > > <Hi-Angel@xxxxxxxxx> wrote:
> > > > >
> > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in meminfo
> > > > > are
> > > > > still non-zero.
> > > > > IOW, ZSWAP doesn't free memory upon being disabled.
> > > > >
> > > > > Stumbled upon this while trying to figure out where did ≈4G of
> > > > > my
> > > > > SWAP memory
> > > > > disappear. Been seeing some unknown memory in SWAP for years,
> > > > > now I
> > > > > suspect ZSWAP
> > > > > might be the culprit. But no way to know for sure because of
> > > > > this
> > > > > bug.
> > > > >
> > > > > # Steps to reproduce
> > > > >
> > > > > 1. Enable ZSWAP
> > > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero
> > > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 >
> > > > > /sys/module/zswap/parameters/enabled"`
> > > > > 4. Look at `grep Zswap /proc/meminfo`
> > > > >
> > > > > ## Expected
> > > > >
> > > > > The rows are zero because ZSWAP is disabled.
> > > >
> > > > Not really, the expected behavior is that further swapouts will
> > > > not
> > > > go
> > > > to zswap, but pages that are already compressed in zswap will not
> > > > be
> > > > written out to the backing swapfile or swapped back to memory. A
> > > > swapoff would be required for the latter.
> > > >
> > > > This is documented in:
> > > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview.
> > >
> > > Oh, I see, thank you, sorry for the noise.
> > >
> > > Then, I'm curious, is it correct to assume that this `Zswap`-
> > > prefixed
> > > memory mentioned in meminfo is never the one that is in SWAP? I
> > > mean,
> > > Zswap being a buffer before data goes to swap kind of implies that
> > > yes,
> > > the data *either* in zswap or in swap. But just wanted to hear that
> > > explicitly.
> >
> > I know this makes sense, but unfortunately no. Zswap is currently
> > transparent to the rest of the system. For all intents and purposes,
> > pages in zswap are considered in swap. You cannot even use zswap with
> > an actual swapfile. So the zswap stats should be a subset of the swap
> > stats.
> >
> > FWIW, Nhat is working on restructuring this to have zswap be its own
> > entity, separate from any swapfiles.
> >
> > >
> > > The background to my question is that I'm trying to find the
> > > culprit
> > > some "phantom memory" eventually filling up my SWAP. This memory is
> > > not
> > > one accounted to apps (as calculated via `smem`), nor to tmpfs. So
> > > my
> > > next suspect was something related to ZSwap.
> > > >
> >
> > As I mentioned, zswap should be transparent to the rest of the
> > system,
> > so it shouldn't make a difference in this case whether the pages are
> > in zswap or in the swapfile.
> >
> > You can use the memory.swap.current counter to find out which memory
> > cgroup currently has swapped out pages (in zswap or in the swapfile).
> > This should help find the application that has memory in swap. If you
> > want to find the exact type of memory (e.g. anon vs tmpfs), that
> > would
> > be more tricky. Perhaps you can swapoff and see what counters
> > increase
> > in memory.stat of the relevant memory cgroup?
>
> Thank you, so, I've waited till my SWAP gets almost full again
> (apparently my new workflow triggers that a lot). It is 7.5G out of 8
> in total. 437M is taken by tmpfs'es, let's subtract for simplicity, so
> I have 7G taken by something else.

If the tmpfs's are created and written to by processes in the user
slice, they should show up memory.swap.current as well.

>
> Now I'm looking at `/sys/fs/cgroup/user.slice/memory.swap.current` and
> it's 4422422528 = 4.1G. That's a lot less than 7G. I'm certain this

Can you check the memory.swap.current value of other slices?

The other possibility is that the pages are swapped out from the root
cgroup, in which case they won't show up in memory.swap.current as
they are basically unaccounted. Although typically user processes
should not be running in the root cgroup.

> "phantom swap memory" is hidden in `user.slice`, because if I wait till
> OOM-killer gets triggered and kills some app, my user-systemd gets
> crashed for some reason, taking down the entire user session, and
> afterwards SWAP is almost free.

Did you check the OOM logs? It is possible that the OOM killer kills
some system process that has some memory in swap as well.

>
> I think this memory.swap.current isn't much different compared to just
> asking `smem` for SWAP taken by individual apps. As of writing the
> words that's 4.6G for the entire system, as calculated by:
>
>         sudo smem -c "name user pid vss pss rss swap" | awk
> '{total+=$7} END {print "Swap memory: " total "K"}'
>
> So 7 - 4.6 = 2.4G of some "phantom" memory.

I am not sure about smem, but memory.swap.current should be accounting
pages swapped out from all memory cgroups except the root.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux