Sorry for may be spamming - i try to share as much information as i can: The difference varnish between my is that: * varnish cgroup consumes active_anon type of mem * my test consumes inactive_file type of mem both get freed by drop_caches but active_anon does not get freed by triggering memoryhigh. Greets, Stefan Am 29.07.19 um 09:07 schrieb Stefan Priebe - Profihost AG: > Hi all, > > it might be that i just missunderstood how it works. > > This test works absolutely fine without any penalty: > > test.sh: > ##### > #!/bin/bash > > sync > echo 3 >/proc/sys/vm/drop_caches > sync > time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null > ##### > > started with: > systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh > > or > > systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True -- > /root/spriebe/test.sh > > In both cases it takes ~ 1m 45s even though it consumes about 2G of mem > in the first case. > > So it seems even though it can only consume a max of 300M in the 2nd > case. It is as fast as the first one without any limit. > > I thought until today that the same would happen for varnish. Where's > the difference? > > I also tried stuff like: > sysctl -w vm.vfs_cache_pressure=1000000 > > but the cgroup memory usage of varnish still raises slowly about 100M > per hour. The varnish process itself stays constant at ~5.6G > > Greets, > Stefan > > Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG: >> here is a memory.stat output of the cgroup: >> # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat >> anon 8113229824 >> file 39735296 >> kernel_stack 26345472 >> slab 24985600 >> sock 339968 >> shmem 0 >> file_mapped 38793216 >> file_dirty 946176 >> file_writeback 0 >> inactive_anon 0 >> active_anon 8113119232 >> inactive_file 40198144 >> active_file 102400 >> unevictable 0 >> slab_reclaimable 2859008 >> slab_unreclaimable 22126592 >> pgfault 178231449 >> pgmajfault 22011 >> pgrefill 393038 >> pgscan 4218254 >> pgsteal 430005 >> pgactivate 295416 >> pgdeactivate 351487 >> pglazyfree 0 >> pglazyfreed 0 >> workingset_refault 401874 >> workingset_activate 62535 >> workingset_nodereclaim 0 >> >> Greets, >> Stefan >> >> Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG: >>> Am 26.07.19 um 09:45 schrieb Michal Hocko: >>>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote: >>>>> Hi Michal, >>>>> >>>>> Am 25.07.19 um 16:01 schrieb Michal Hocko: >>>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote: >>>>>>> Hello all, >>>>>>> >>>>>>> i hope i added the right list and people - if i missed someone i would >>>>>>> be happy to know. >>>>>>> >>>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a >>>>>>> varnish service. >>>>>>> >>>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value >>>>>>> and stops working due to throttling. >>>>>> >>>>>> What do you mean by "stops working"? Does it mean that the process is >>>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you >>>>>> what the kernel executing for the process. >>>>> >>>>> The service no longer responses to HTTP requests. >>>>> >>>>> stack switches in this case between: >>>>> [<0>] io_schedule+0x12/0x40 >>>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0 >>>>> [<0>] filemap_fault+0x42f/0x830 >>>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120 >>>>> [<0>] __do_fault+0x57/0x108 >>>>> [<0>] __handle_mm_fault+0x949/0xef0 >>>>> [<0>] handle_mm_fault+0xfc/0x1f0 >>>>> [<0>] __do_page_fault+0x24a/0x450 >>>>> [<0>] do_page_fault+0x32/0x110 >>>>> [<0>] async_page_fault+0x1e/0x30 >>>>> [<0>] 0xffffffffffffffff >>>>> >>>>> and >>>>> >>>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70 >>>>> [<0>] do_sys_poll+0x51e/0x5f0 >>>>> [<0>] __x64_sys_poll+0xe7/0x130 >>>>> [<0>] do_syscall_64+0x5b/0x170 >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [<0>] 0xffffffffffffffff >>>> >>>> Neither of the two seem to be memcg related. >>> >>> Yes but at least the xfs one is a page fault - isn't this related? >>> >>>> Have you tried to get >>>> several snapshots and see if the backtrace is stable? >>> No it's not it switches most of the time between these both. But as long >>> as the xfs one with the page fault is seen it does not serve requests >>> and that one is seen for at least 1-5s than the poill one is visible and >>> than the xfs one again for 1-5s. >>> >>> This happens if i do: >>> systemctl set-property --runtime varnish.service MemoryHigh=6.5G >>> >>> if i set: >>> systemctl set-property --runtime varnish.service MemoryHigh=14G >>> >>> i never get the xfs handle_mm fault one. This is reproducable. >>> >>>> tell you whether your application is stuck in a single syscall or they >>>> are just progressing very slowly (-ttt parameter should give you timing) >>> >>> Yes it's still going forward but really really slow due to memory >>> pressure. memory.pressure of varnish cgroup shows high values above 100 >>> or 200. >>> >>> I can reproduce the same with rsync or other tasks using memory for >>> inodes and dentries. What i don't unterstand is that the kernel does not >>> reclaim memory for the userspace process and drops the cache. I can't >>> believe those entries are hot - as they must be at least some days old >>> as a fresh process running a day only consumes about 200MB of indoe / >>> dentries / page cache. >>> >>> Greets, >>> Stefan >>>