Re: No memory reclaim while reaching MemoryHigh

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Fri, 26 Jul 2019 20:30:35 +0200

Am 26.07.19 um 09:45 schrieb Michal Hocko:
> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>> Hi Michal,
>>
>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>> Hello all,
>>>>
>>>> i hope i added the right list and people - if i missed someone i would
>>>> be happy to know.
>>>>
>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>> varnish service.
>>>>
>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>> and stops working due to throttling.
>>>
>>> What do you mean by "stops working"? Does it mean that the process is
>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>> what the kernel executing for the process.
>>
>> The service no longer responses to HTTP requests.
>>
>> stack switches in this case between:
>> [<0>] io_schedule+0x12/0x40
>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>> [<0>] filemap_fault+0x42f/0x830
>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>> [<0>] __do_fault+0x57/0x108
>> [<0>] __handle_mm_fault+0x949/0xef0
>> [<0>] handle_mm_fault+0xfc/0x1f0
>> [<0>] __do_page_fault+0x24a/0x450
>> [<0>] do_page_fault+0x32/0x110
>> [<0>] async_page_fault+0x1e/0x30
>> [<0>] 0xffffffffffffffff
>>
>> and
>>
>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>> [<0>] do_sys_poll+0x51e/0x5f0
>> [<0>] __x64_sys_poll+0xe7/0x130
>> [<0>] do_syscall_64+0x5b/0x170
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [<0>] 0xffffffffffffffff
> 
> Neither of the two seem to be memcg related.

Yes but at least the xfs one is a page fault - isn't this related?

> Have you tried to get
> several snapshots and see if the backtrace is stable?
No it's not it switches most of the time between these both. But as long
as the xfs one with the page fault is seen it does not serve requests
and that one is seen for at least 1-5s than the poill one is visible and
than the xfs one again for 1-5s.

This happens if i do:
systemctl set-property --runtime varnish.service MemoryHigh=6.5G

if i set:
systemctl set-property --runtime varnish.service MemoryHigh=14G

i never get the xfs handle_mm fault one. This is reproducable.

> tell you whether your application is stuck in a single syscall or they
> are just progressing very slowly (-ttt parameter should give you timing)

Yes it's still going forward but really really slow due to memory
pressure. memory.pressure of varnish cgroup shows high values above 100
or 200.

I can reproduce the same with rsync or other tasks using memory for
inodes and dentries. What i don't unterstand is that the kernel does not
reclaim memory for the userspace process and drops the cache. I can't
believe those entries are hot - as they must be at least some days old
as a fresh process running a day only consumes about 200MB of indoe /
dentries / page cache.

Greets,
Stefan