On Sun, 2024-10-27 at 13:11 +0300, Konstantin Kharlamov wrote: > On Sat, 2024-10-26 at 23:46 -0700, Yosry Ahmed wrote: > > On Sat, Oct 26, 2024 at 8:14 PM Nhat Pham <nphamcs@xxxxxxxxx> > > wrote: > > > > > > On Sat, Oct 26, 2024 at 5:29 PM Konstantin Kharlamov > > > <Hi-Angel@xxxxxxxxx> wrote: > > > > > > > > That was a good idea! The > > > > `/sys/fs/cgroup/system.slice/memory.swap.current` seems to have > > > > the > > > > missing half of the SWAP memory. From my understanding of the > > > > `systemctl status` graph `sytem.slice` and `user.slice` groups > > > > do > > > > not > > > > intersect, and by adding up `system.slice/…` + `user.slice/…` I > > > > get > > > > around 8G. > > > > > > > > However, I'm still unclear what does this memory belong to. > > > > `system.slice/memory.swap.current` is 4.4G currently, that's a > > > > lot and > > > > I'm not seeing anything that could take so much memory. > > > > I am not very familiar with what usually runs in system.slice. > > > > > > > > I assume you do not have any proactive memory reclaimer? :) I > > > believe > > > the top utility can display swap usage by process. Have you tried > > > that? > > > > > > There are a couple of edge cases - for instance, if you disable > > > zswap > > > writeback and zswap at the same time. We will allocate slots on > > > swapfile, and store it at the page table entry, but we cannot > > > store > > > the page's content in zswap or the swapfile, so the page remains > > > in > > > memory. You're occupying swap space, but are not really saving > > > any > > > memory usage. > > > > > > IIRC, there is also an edge case where a page is faulted back > > > into > > > memory from swap, but the associated swap space cannot be > > > immediately > > > released. This should be temporary though - memory reclaimer will > > > attempt to release these pages later on, or they can be released > > > when > > > we scan the swapfile for slots during swap out. > > > > I don't think this is an edge case. I think when we swapin a page > > we > > generally leave it in the swapcache if there is no pressure on swap > > space. In that case the memory is not really swapped out, but > > because > > it remains in the swapcache it is still reserving a swap slot, so > > it > > shows up as swap usage. > > > > Konstantin, could you check the amount of swapcache you have, > > whether > > through /proc/vmstat or memory.stat on both user and system slices? > > Sure > > λ grep cache /sys/fs/cgroup/*/memory.stat > … > /sys/fs/cgroup/system.slice/memory.stat:swapcached 434917376 > /sys/fs/cgroup/user.slice/memory.stat:swapcached 15478784 > > `434917376` is a 0.4G, not much. In comparison, > `system.slice/memory.swap.current` is currently `4764139520 = 4.4G`. I figured since 434917376 is 10 numbers, I'd grep everything in memory.stat that has ten digits: λ grep -P "\d{10}$" /sys/fs/cgroup/system.slice/memory.stat file 2671874048 shmem 2592768000 zswapped 2997760000 active_anon 1491247104 unevictable 1269555200 well, to me personally this isn't helpful, but perhaps am I missing something…