On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote: > Hi Michal, > > Am 25.07.19 um 16:01 schrieb Michal Hocko: > > On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote: > >> Hello all, > >> > >> i hope i added the right list and people - if i missed someone i would > >> be happy to know. > >> > >> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a > >> varnish service. > >> > >> It happens that the varnish.service cgroup reaches it's MemoryHigh value > >> and stops working due to throttling. > > > > What do you mean by "stops working"? Does it mean that the process is > > stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you > > what the kernel executing for the process. > > The service no longer responses to HTTP requests. > > stack switches in this case between: > [<0>] io_schedule+0x12/0x40 > [<0>] __lock_page_or_retry+0x1e7/0x4e0 > [<0>] filemap_fault+0x42f/0x830 > [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120 > [<0>] __do_fault+0x57/0x108 > [<0>] __handle_mm_fault+0x949/0xef0 > [<0>] handle_mm_fault+0xfc/0x1f0 > [<0>] __do_page_fault+0x24a/0x450 > [<0>] do_page_fault+0x32/0x110 > [<0>] async_page_fault+0x1e/0x30 > [<0>] 0xffffffffffffffff > > and > > [<0>] poll_schedule_timeout.constprop.13+0x42/0x70 > [<0>] do_sys_poll+0x51e/0x5f0 > [<0>] __x64_sys_poll+0xe7/0x130 > [<0>] do_syscall_64+0x5b/0x170 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [<0>] 0xffffffffffffffff Neither of the two seem to be memcg related. Have you tried to get several snapshots and see if the backtrace is stable? strace would also tell you whether your application is stuck in a single syscall or they are just progressing very slowly (-ttt parameter should give you timing) -- Michal Hocko SUSE Labs