Re: No memory reclaim while reaching MemoryHigh

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Mon, 29 Jul 2019 09:45:00 +0200

Sorry for may be spamming - i try to share as much information as i can:

The difference varnish between my is that:
* varnish cgroup consumes active_anon type of mem
* my test consumes inactive_file type of mem

both get freed by drop_caches but active_anon does not get freed by
triggering memoryhigh.

Greets,
Stefan

Am 29.07.19 um 09:07 schrieb Stefan Priebe - Profihost AG:
> Hi all,
> 
> it might be that i just missunderstood how it works.
> 
> This test works absolutely fine without any penalty:
> 
> test.sh:
> #####
> #!/bin/bash
> 
> sync
> echo 3 >/proc/sys/vm/drop_caches
> sync
> time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null
> #####
> 
> started with:
> systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh
> 
> or
> 
> systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True --
> /root/spriebe/test.sh
> 
> In both cases it takes ~ 1m 45s even though it consumes about 2G of mem
> in the first case.
> 
> So it seems even though it can only consume a max of 300M in the 2nd
> case. It is as fast as the first one without any limit.
> 
> I thought until today that the same would happen for varnish. Where's
> the difference?
> 
> I also tried stuff like:
> sysctl -w vm.vfs_cache_pressure=1000000
> 
> but the cgroup memory usage of varnish still raises slowly about 100M
> per hour. The varnish process itself stays constant at ~5.6G
> 
> Greets,
> Stefan
> 
> Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG:
>> here is a memory.stat output of the cgroup:
>> # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat
>> anon 8113229824
>> file 39735296
>> kernel_stack 26345472
>> slab 24985600
>> sock 339968
>> shmem 0
>> file_mapped 38793216
>> file_dirty 946176
>> file_writeback 0
>> inactive_anon 0
>> active_anon 8113119232
>> inactive_file 40198144
>> active_file 102400
>> unevictable 0
>> slab_reclaimable 2859008
>> slab_unreclaimable 22126592
>> pgfault 178231449
>> pgmajfault 22011
>> pgrefill 393038
>> pgscan 4218254
>> pgsteal 430005
>> pgactivate 295416
>> pgdeactivate 351487
>> pglazyfree 0
>> pglazyfreed 0
>> workingset_refault 401874
>> workingset_activate 62535
>> workingset_nodereclaim 0
>>
>> Greets,
>> Stefan
>>
>> Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG:
>>> Am 26.07.19 um 09:45 schrieb Michal Hocko:
>>>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Michal,
>>>>>
>>>>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello all,
>>>>>>>
>>>>>>> i hope i added the right list and people - if i missed someone i would
>>>>>>> be happy to know.
>>>>>>>
>>>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>>>>> varnish service.
>>>>>>>
>>>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>>>>> and stops working due to throttling.
>>>>>>
>>>>>> What do you mean by "stops working"? Does it mean that the process is
>>>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>>>>> what the kernel executing for the process.
>>>>>
>>>>> The service no longer responses to HTTP requests.
>>>>>
>>>>> stack switches in this case between:
>>>>> [<0>] io_schedule+0x12/0x40
>>>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>>>>> [<0>] filemap_fault+0x42f/0x830
>>>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>>>>> [<0>] __do_fault+0x57/0x108
>>>>> [<0>] __handle_mm_fault+0x949/0xef0
>>>>> [<0>] handle_mm_fault+0xfc/0x1f0
>>>>> [<0>] __do_page_fault+0x24a/0x450
>>>>> [<0>] do_page_fault+0x32/0x110
>>>>> [<0>] async_page_fault+0x1e/0x30
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> and
>>>>>
>>>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>>>>> [<0>] do_sys_poll+0x51e/0x5f0
>>>>> [<0>] __x64_sys_poll+0xe7/0x130
>>>>> [<0>] do_syscall_64+0x5b/0x170
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [<0>] 0xffffffffffffffff
>>>>
>>>> Neither of the two seem to be memcg related.
>>>
>>> Yes but at least the xfs one is a page fault - isn't this related?
>>>
>>>> Have you tried to get
>>>> several snapshots and see if the backtrace is stable?
>>> No it's not it switches most of the time between these both. But as long
>>> as the xfs one with the page fault is seen it does not serve requests
>>> and that one is seen for at least 1-5s than the poill one is visible and
>>> than the xfs one again for 1-5s.
>>>
>>> This happens if i do:
>>> systemctl set-property --runtime varnish.service MemoryHigh=6.5G
>>>
>>> if i set:
>>> systemctl set-property --runtime varnish.service MemoryHigh=14G
>>>
>>> i never get the xfs handle_mm fault one. This is reproducable.
>>>
>>>> tell you whether your application is stuck in a single syscall or they
>>>> are just progressing very slowly (-ttt parameter should give you timing)
>>>
>>> Yes it's still going forward but really really slow due to memory
>>> pressure. memory.pressure of varnish cgroup shows high values above 100
>>> or 200.
>>>
>>> I can reproduce the same with rsync or other tasks using memory for
>>> inodes and dentries. What i don't unterstand is that the kernel does not
>>> reclaim memory for the userspace process and drops the cache. I can't
>>> believe those entries are hot - as they must be at least some days old
>>> as a fresh process running a day only consumes about 200MB of indoe /
>>> dentries / page cache.
>>>
>>> Greets,
>>> Stefan
>>>