Re: Caching/buffers become useless after some time

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 21 Aug 2018 09:19:34 +0200

On 8/21/18 8:49 AM, Michal Hocko wrote:
> On Tue 21-08-18 02:36:05, Marinko Catovic wrote:
> [...]
>>>> Well, there are some drivers (mostly out-of-tree) which are high order
>>>> hungry. You can try to trace all allocations which with order > 0 and
>>>> see who that might be.
>>>> # mount -t tracefs none /debug/trace/
>>>> # echo stacktrace > /debug/trace/trace_options
>>>> # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter
>>>> # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable
>>>> # cat /debug/trace/trace_pipe
>>>>
>>>> And later this to disable tracing.
>>>> # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable
>>>
>>> I just had a major cache-useless situation, with like 100M/8G usage only
>>> and horrible performance. There you go:
>>>
>>> https://nofile.io/f/mmwVedaTFsd
> 
> $ grep mm_page_alloc: trace_pipe | sed 's@.*order=\([0-9]*\) .*gfp_flags=\(.*\)@\1 \2@' | sort | uniq -c
>     428 1 __GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
>      10 1 __GFP_HIGH|__GFP_ATOMIC|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
>       6 1 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
>    3061 1 GFP_KERNEL_ACCOUNT|__GFP_ZERO
>    8672 1 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ACCOUNT
>    2547 1 __GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_THISNODE
>       4 2 __GFP_HIGH|__GFP_ATOMIC|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
>       5 2 __GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE
>   20030 2 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ACCOUNT
>    1528 3 GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>    2476 3 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>    6512 3 GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ACCOUNT
>     277 9 GFP_TRANSHUGE|__GFP_THISNODE
> 
> This only covers ~90s of the allocator activity. Most of those requests
> are not troggering any reclaim (GFP_NOWAIT/ATOMIC). Vlastimil will
> know better but this might mean that we are not envoking kcompactd
> enough.

Earlier vmstat data showed that it's invoked but responsible for less
than 1% of compaction activity.

> But considering that we have suspected that an overly eager
> reclaim triggers the page cache reduction I am not really sure I see the
> above to match that theory.

Yeah, the GFP_NOWAIT/GFP_ATOMIC above shouldn't be responsible for such
overreclaim?

> Btw. I was probably not specific enough. This data should be collected
> _during_ the time when the page cache is disappearing. I suspect you
> have started collecting after the fact.

It might be also interesting to do in the problematic state, instead of
dropping caches:

- save snapshot of /proc/vmstat and /proc/pagetypeinfo
- echo 1 > /proc/sys/vm/compact_memory
- save new snapshot of /proc/vmstat and /proc/pagetypeinfo

That would show if compaction is able to help at all.

> Btw. vast majority of order-3 requests come from the network layer. Are
> you using a large MTU (jumbo packets)?
>