Hi Benjamin, On Fri, Feb 25, 2022 at 05:10:05PM +0100, Benjamin Berg wrote: > Hi, > > I am seeing memory.swap.current usages for the gnome-shell cgroup that > seem high if I compare them to smaps_rollup for the contained > processes. As I don't have an explanation, I thought I would ask here > (shared memory?). > > What I am seeing is (see below, after a tail /dev/zero): > > memory.swap.current: > 686MiB > "Swap" lines from /proc/$pid/smaps_rollup added up: > 435MiB > > We should be moving launched applications out of the shell cgroup > before doing execve(), so I think we can rule out that as a possible > explanation. > > I am mostly curious as we currently do swap based kills using systemd- > oomd. So if swap accounting for GNOME Shell is high, then it makes it a > more likely target unfortunately. Shared memory is one option. For example, when you access tmpfs files with open() read() write() close() instead of mmap(). Another option is swapcache. When swap space is plentiful, the kernel makes it hold on to copies of pages even after they've been swapped back in. This way, the next time they need to get "swapped out", it doesn't require any IO, it can just drop the in-memory copy. From an smaps POV, swapped in pages are Rss, not Swap. But their swap copies still contribute to memory.swap.current, hence the discrepancy. In terms of OOM killing, the kernel will stop keeping swap copies around when more than half of swap space is used. That should give plenty of headroom toward the OOM killing thresholds. If you want to poke around on your machine, here is a drgn script that tallies up the cache-only swap entries: --- #!/usr/bin/drgn MAX_SWAPFILES=25 SWAP_HAS_CACHE=0x40 swapcache=0 for i in range(MAX_SWAPFILES): si = prog['swap_info'][i] if si: for offset in range(si.max.value_()): if si.swap_map[offset].value_() == SWAP_HAS_CACHE: swapcache += 1 print("Cache-only swap space: %.2fM" % (swapcache * 4 / 1024.0))