Re: Caching/buffers become useless after some time

Michal Hocko <mhocko@xxxxxxxx> · Mon, 6 Aug 2018 14:00:42 +0200

[Please do not top-post]

On Mon 06-08-18 12:29:43, Marinko Catovic wrote:
> > Maybe a memcg with kmemcg limit? Michal could know more.
> 
> Could you/Michael explain this perhaps?

The only way how kmemcg limit could help I can think of would be to
enforce metadata reclaim much more often. But that is rather a bad
workaround.

> The hardware is pretty much high end datacenter grade, I really would
> not know how this is to be related with the hardware :(

Well, there are some drivers (mostly out-of-tree) which are high order
hungry. You can try to trace all allocations which with order > 0 and
see who that might be.
# mount -t tracefs none /debug/trace/
# echo stacktrace > /debug/trace/trace_options
# echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter
# echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable
# cat /debug/trace/trace_pipe

And later this to disable tracing.
# echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable

> I do not understand why apparently the caching is working very much
> fine for the beginning after a drop_caches, then degrades to low usage
> somewhat later.

Because a lot of FS metadata is fragmenting the memory and a large
number of high order allocations which want to be served reclaim a lot
of memory to achieve their gol. Considering a large part of memory is
fragmented by unmovable objects there is no other way than to use
reclaim to release that memory.

> I can not possibly drop caches automatically, since
> this requires monitoring for overload with temporary dropping traffic
> on specific ports until the writes/reads cool down.

You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches
should be sufficient to drop metadata only.
-- 
Michal Hocko
SUSE Labs