Re: No memory reclaim while reaching MemoryHigh

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Mon, 29 Jul 2019 09:07:59 +0200

Hi all,

it might be that i just missunderstood how it works.

This test works absolutely fine without any penalty:

test.sh:
#####
#!/bin/bash

sync
echo 3 >/proc/sys/vm/drop_caches
sync
time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null
#####

started with:
systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh

or

systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True --
/root/spriebe/test.sh

In both cases it takes ~ 1m 45s even though it consumes about 2G of mem
in the first case.

So it seems even though it can only consume a max of 300M in the 2nd
case. It is as fast as the first one without any limit.

I thought until today that the same would happen for varnish. Where's
the difference?

I also tried stuff like:
sysctl -w vm.vfs_cache_pressure=1000000

but the cgroup memory usage of varnish still raises slowly about 100M
per hour. The varnish process itself stays constant at ~5.6G

Greets,
Stefan

Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG:
> here is a memory.stat output of the cgroup:
> # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat
> anon 8113229824
> file 39735296
> kernel_stack 26345472
> slab 24985600
> sock 339968
> shmem 0
> file_mapped 38793216
> file_dirty 946176
> file_writeback 0
> inactive_anon 0
> active_anon 8113119232
> inactive_file 40198144
> active_file 102400
> unevictable 0
> slab_reclaimable 2859008
> slab_unreclaimable 22126592
> pgfault 178231449
> pgmajfault 22011
> pgrefill 393038
> pgscan 4218254
> pgsteal 430005
> pgactivate 295416
> pgdeactivate 351487
> pglazyfree 0
> pglazyfreed 0
> workingset_refault 401874
> workingset_activate 62535
> workingset_nodereclaim 0
> 
> Greets,
> Stefan
> 
> Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG:
>> Am 26.07.19 um 09:45 schrieb Michal Hocko:
>>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>>>> Hi Michal,
>>>>
>>>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> i hope i added the right list and people - if i missed someone i would
>>>>>> be happy to know.
>>>>>>
>>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>>>> varnish service.
>>>>>>
>>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>>>> and stops working due to throttling.
>>>>>
>>>>> What do you mean by "stops working"? Does it mean that the process is
>>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>>>> what the kernel executing for the process.
>>>>
>>>> The service no longer responses to HTTP requests.
>>>>
>>>> stack switches in this case between:
>>>> [<0>] io_schedule+0x12/0x40
>>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>>>> [<0>] filemap_fault+0x42f/0x830
>>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>>>> [<0>] __do_fault+0x57/0x108
>>>> [<0>] __handle_mm_fault+0x949/0xef0
>>>> [<0>] handle_mm_fault+0xfc/0x1f0
>>>> [<0>] __do_page_fault+0x24a/0x450
>>>> [<0>] do_page_fault+0x32/0x110
>>>> [<0>] async_page_fault+0x1e/0x30
>>>> [<0>] 0xffffffffffffffff
>>>>
>>>> and
>>>>
>>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>>>> [<0>] do_sys_poll+0x51e/0x5f0
>>>> [<0>] __x64_sys_poll+0xe7/0x130
>>>> [<0>] do_syscall_64+0x5b/0x170
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [<0>] 0xffffffffffffffff
>>>
>>> Neither of the two seem to be memcg related.
>>
>> Yes but at least the xfs one is a page fault - isn't this related?
>>
>>> Have you tried to get
>>> several snapshots and see if the backtrace is stable?
>> No it's not it switches most of the time between these both. But as long
>> as the xfs one with the page fault is seen it does not serve requests
>> and that one is seen for at least 1-5s than the poill one is visible and
>> than the xfs one again for 1-5s.
>>
>> This happens if i do:
>> systemctl set-property --runtime varnish.service MemoryHigh=6.5G
>>
>> if i set:
>> systemctl set-property --runtime varnish.service MemoryHigh=14G
>>
>> i never get the xfs handle_mm fault one. This is reproducable.
>>
>>> tell you whether your application is stuck in a single syscall or they
>>> are just progressing very slowly (-ttt parameter should give you timing)
>>
>> Yes it's still going forward but really really slow due to memory
>> pressure. memory.pressure of varnish cgroup shows high values above 100
>> or 200.
>>
>> I can reproduce the same with rsync or other tasks using memory for
>> inodes and dentries. What i don't unterstand is that the kernel does not
>> reclaim memory for the userspace process and drops the cache. I can't
>> believe those entries are hot - as they must be at least some days old
>> as a fresh process running a day only consumes about 200MB of indoe /
>> dentries / page cache.
>>
>> Greets,
>> Stefan
>>