On Sat, 2024-10-26 at 10:47 -0700, Yosry Ahmed wrote: > On Sat, Oct 26, 2024 at 4:33 AM Konstantin Kharlamov > <Hi-Angel@xxxxxxxxx> wrote: > > > > On Fri, 2024-10-25 at 00:50 -0700, Yosry Ahmed wrote: > > > On Thu, Oct 24, 2024 at 11:41 PM Konstantin Kharlamov > > > <Hi-Angel@xxxxxxxxx> wrote: > > > > > > > > On Thu, 2024-10-24 at 13:47 -0700, Yosry Ahmed wrote: > > > > > On Thu, Oct 24, 2024 at 6:02 AM Konstantin Kharlamov > > > > > <Hi-Angel@xxxxxxxxx> wrote: > > > > > > > > > > > > When ZSWAP is disabled, the `Zswap` and `Zswapped` in > > > > > > meminfo > > > > > > are > > > > > > still non-zero. > > > > > > IOW, ZSWAP doesn't free memory upon being disabled. > > > > > > > > > > > > Stumbled upon this while trying to figure out where did ≈4G > > > > > > of > > > > > > my > > > > > > SWAP memory > > > > > > disappear. Been seeing some unknown memory in SWAP for > > > > > > years, > > > > > > now I > > > > > > suspect ZSWAP > > > > > > might be the culprit. But no way to know for sure because > > > > > > of > > > > > > this > > > > > > bug. > > > > > > > > > > > > # Steps to reproduce > > > > > > > > > > > > 1. Enable ZSWAP > > > > > > 2. Wait for `grep Zswap /proc/meminfo` to become non-zero > > > > > > 3. Disable ZSWAP via `sudo sh -c "echo 0 > > > > > > > /sys/module/zswap/parameters/enabled"` > > > > > > 4. Look at `grep Zswap /proc/meminfo` > > > > > > > > > > > > ## Expected > > > > > > > > > > > > The rows are zero because ZSWAP is disabled. > > > > > > > > > > Not really, the expected behavior is that further swapouts > > > > > will > > > > > not > > > > > go > > > > > to zswap, but pages that are already compressed in zswap will > > > > > not > > > > > be > > > > > written out to the backing swapfile or swapped back to > > > > > memory. A > > > > > swapoff would be required for the latter. > > > > > > > > > > This is documented in: > > > > > https://docs.kernel.org/admin-guide/mm/zswap.html#overview. > > > > > > > > Oh, I see, thank you, sorry for the noise. > > > > > > > > Then, I'm curious, is it correct to assume that this `Zswap`- > > > > prefixed > > > > memory mentioned in meminfo is never the one that is in SWAP? I > > > > mean, > > > > Zswap being a buffer before data goes to swap kind of implies > > > > that > > > > yes, > > > > the data *either* in zswap or in swap. But just wanted to hear > > > > that > > > > explicitly. > > > > > > I know this makes sense, but unfortunately no. Zswap is currently > > > transparent to the rest of the system. For all intents and > > > purposes, > > > pages in zswap are considered in swap. You cannot even use zswap > > > with > > > an actual swapfile. So the zswap stats should be a subset of the > > > swap > > > stats. > > > > > > FWIW, Nhat is working on restructuring this to have zswap be its > > > own > > > entity, separate from any swapfiles. > > > > > > > > > > > The background to my question is that I'm trying to find the > > > > culprit > > > > some "phantom memory" eventually filling up my SWAP. This > > > > memory is > > > > not > > > > one accounted to apps (as calculated via `smem`), nor to tmpfs. > > > > So > > > > my > > > > next suspect was something related to ZSwap. > > > > > > > > > > > As I mentioned, zswap should be transparent to the rest of the > > > system, > > > so it shouldn't make a difference in this case whether the pages > > > are > > > in zswap or in the swapfile. > > > > > > You can use the memory.swap.current counter to find out which > > > memory > > > cgroup currently has swapped out pages (in zswap or in the > > > swapfile). > > > This should help find the application that has memory in swap. If > > > you > > > want to find the exact type of memory (e.g. anon vs tmpfs), that > > > would > > > be more tricky. Perhaps you can swapoff and see what counters > > > increase > > > in memory.stat of the relevant memory cgroup? > > > > Thank you, so, I've waited till my SWAP gets almost full again > > (apparently my new workflow triggers that a lot). It is 7.5G out of > > 8 > > in total. 437M is taken by tmpfs'es, let's subtract for simplicity, > > so > > I have 7G taken by something else. > > If the tmpfs's are created and written to by processes in the user > slice, they should show up memory.swap.current as well. > > > > > Now I'm looking at `/sys/fs/cgroup/user.slice/memory.swap.current` > > and > > it's 4422422528 = 4.1G. That's a lot less than 7G. I'm certain this > > Can you check the memory.swap.current value of other slices? That was a good idea! The `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have the missing half of the SWAP memory. From my understanding of the `systemctl status` graph `sytem.slice` and `user.slice` groups do not intersect, and by adding up `system.slice/…` + `user.slice/…` I get around 8G. However, I'm still unclear what does this memory belong to. `system.slice/memory.swap.current` is 4.4G currently, that's a lot and I'm not seeing anything that could take so much memory. An even larger related mystery is why does this memory not show up in `smem` numbers for individual applications (which calculates it by going over `/proc/$pid/smaps` for every pid). > The other possibility is that the pages are swapped out from the root > cgroup, in which case they won't show up in memory.swap.current as > they are basically unaccounted. Although typically user processes > should not be running in the root cgroup. > > > "phantom swap memory" is hidden in `user.slice`, because if I wait > > till > > OOM-killer gets triggered and kills some app, my user-systemd gets > > crashed for some reason, taking down the entire user session, and > > afterwards SWAP is almost free. > > Did you check the OOM logs? It is possible that the OOM killer kills > some system process that has some memory in swap as well. I did, logs are pretty uninteresting. OOM kills `electron` (of element- desktop), but I tried closing it before the OOM, that didn't have much influence. Just an arbitrary victim. Then a few lines later a `Process 560296 (systemd) of user 1000 terminated abnormally with signal 11/SEGV`. Wasn't able to get stacktrace for systemd with Archlinux's debuginfo servers. And then everything gets down with systemd. I just tried closing every application I have open and I still got 5.5 in SWAP. Well, obviously there are services still running, Plasma, i3wm… Not many suspects left though.