Hi all, it might be that i just missunderstood how it works. This test works absolutely fine without any penalty: test.sh: ##### #!/bin/bash sync echo 3 >/proc/sys/vm/drop_caches sync time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null ##### started with: systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh or systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True -- /root/spriebe/test.sh In both cases it takes ~ 1m 45s even though it consumes about 2G of mem in the first case. So it seems even though it can only consume a max of 300M in the 2nd case. It is as fast as the first one without any limit. I thought until today that the same would happen for varnish. Where's the difference? I also tried stuff like: sysctl -w vm.vfs_cache_pressure=1000000 but the cgroup memory usage of varnish still raises slowly about 100M per hour. The varnish process itself stays constant at ~5.6G Greets, Stefan Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG: > here is a memory.stat output of the cgroup: > # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat > anon 8113229824 > file 39735296 > kernel_stack 26345472 > slab 24985600 > sock 339968 > shmem 0 > file_mapped 38793216 > file_dirty 946176 > file_writeback 0 > inactive_anon 0 > active_anon 8113119232 > inactive_file 40198144 > active_file 102400 > unevictable 0 > slab_reclaimable 2859008 > slab_unreclaimable 22126592 > pgfault 178231449 > pgmajfault 22011 > pgrefill 393038 > pgscan 4218254 > pgsteal 430005 > pgactivate 295416 > pgdeactivate 351487 > pglazyfree 0 > pglazyfreed 0 > workingset_refault 401874 > workingset_activate 62535 > workingset_nodereclaim 0 > > Greets, > Stefan > > Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG: >> Am 26.07.19 um 09:45 schrieb Michal Hocko: >>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote: >>>> Hi Michal, >>>> >>>> Am 25.07.19 um 16:01 schrieb Michal Hocko: >>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote: >>>>>> Hello all, >>>>>> >>>>>> i hope i added the right list and people - if i missed someone i would >>>>>> be happy to know. >>>>>> >>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a >>>>>> varnish service. >>>>>> >>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value >>>>>> and stops working due to throttling. >>>>> >>>>> What do you mean by "stops working"? Does it mean that the process is >>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you >>>>> what the kernel executing for the process. >>>> >>>> The service no longer responses to HTTP requests. >>>> >>>> stack switches in this case between: >>>> [<0>] io_schedule+0x12/0x40 >>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0 >>>> [<0>] filemap_fault+0x42f/0x830 >>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120 >>>> [<0>] __do_fault+0x57/0x108 >>>> [<0>] __handle_mm_fault+0x949/0xef0 >>>> [<0>] handle_mm_fault+0xfc/0x1f0 >>>> [<0>] __do_page_fault+0x24a/0x450 >>>> [<0>] do_page_fault+0x32/0x110 >>>> [<0>] async_page_fault+0x1e/0x30 >>>> [<0>] 0xffffffffffffffff >>>> >>>> and >>>> >>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70 >>>> [<0>] do_sys_poll+0x51e/0x5f0 >>>> [<0>] __x64_sys_poll+0xe7/0x130 >>>> [<0>] do_syscall_64+0x5b/0x170 >>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [<0>] 0xffffffffffffffff >>> >>> Neither of the two seem to be memcg related. >> >> Yes but at least the xfs one is a page fault - isn't this related? >> >>> Have you tried to get >>> several snapshots and see if the backtrace is stable? >> No it's not it switches most of the time between these both. But as long >> as the xfs one with the page fault is seen it does not serve requests >> and that one is seen for at least 1-5s than the poill one is visible and >> than the xfs one again for 1-5s. >> >> This happens if i do: >> systemctl set-property --runtime varnish.service MemoryHigh=6.5G >> >> if i set: >> systemctl set-property --runtime varnish.service MemoryHigh=14G >> >> i never get the xfs handle_mm fault one. This is reproducable. >> >>> tell you whether your application is stuck in a single syscall or they >>> are just progressing very slowly (-ttt parameter should give you timing) >> >> Yes it's still going forward but really really slow due to memory >> pressure. memory.pressure of varnish cgroup shows high values above 100 >> or 200. >> >> I can reproduce the same with rsync or other tasks using memory for >> inodes and dentries. What i don't unterstand is that the kernel does not >> reclaim memory for the userspace process and drops the cache. I can't >> believe those entries are hot - as they must be at least some days old >> as a fresh process running a day only consumes about 200MB of indoe / >> dentries / page cache. >> >> Greets, >> Stefan >>