[Please do not top-post] On Mon 06-08-18 12:29:43, Marinko Catovic wrote: > > Maybe a memcg with kmemcg limit? Michal could know more. > > Could you/Michael explain this perhaps? The only way how kmemcg limit could help I can think of would be to enforce metadata reclaim much more often. But that is rather a bad workaround. > The hardware is pretty much high end datacenter grade, I really would > not know how this is to be related with the hardware :( Well, there are some drivers (mostly out-of-tree) which are high order hungry. You can try to trace all allocations which with order > 0 and see who that might be. # mount -t tracefs none /debug/trace/ # echo stacktrace > /debug/trace/trace_options # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable # cat /debug/trace/trace_pipe And later this to disable tracing. # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable > I do not understand why apparently the caching is working very much > fine for the beginning after a drop_caches, then degrades to low usage > somewhat later. Because a lot of FS metadata is fragmenting the memory and a large number of high order allocations which want to be served reclaim a lot of memory to achieve their gol. Considering a large part of memory is fragmented by unmovable objects there is no other way than to use reclaim to release that memory. > I can not possibly drop caches automatically, since > this requires monitoring for overload with temporary dropping traffic > on specific ports until the writes/reads cool down. You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches should be sufficient to drop metadata only. -- Michal Hocko SUSE Labs