Re: Caching/buffers become useless after some time

Michal Hocko <mhocko@xxxxxxxx> · Fri, 26 Oct 2018 10:01:37 +0200

Sorry for late reply. Busy as always...

On Mon 22-10-18 03:19:57, Marinko Catovic wrote:
[...]
> There we go again.
> 
> First of all, I have set up this monitoring on 1 host, as a matter of
> fact it did not occur on that single
> one for days and weeks now, so I set this up again on all the hosts
> and it just happened again on another one.
> 
> This issue is far from over, even when upgrading to the latest 4.18.12
> 
> https://nofile.io/f/z2KeNwJSMDj/vmstat-2.zip
> https://nofile.io/f/5ezPUkFWtnx/trace_pipe-2.gz

I cannot download these. I am getting an invalid certificate and
403 when ignoring it

[...]

> Also, I'd like to ask for a workaround until this is fixed someday:
> echo 3 > drop_caches can take a very
> long time when the host is busy with I/O in the background. According
> to some resources in the net I discovered
> that dropping caches operates until some lower threshold is reached,
> which is less and less likely, when the
> host is really busy. Could one point out what threshold this is perhaps?
> I was thinking of e.g. mm/vmscan.c
> 
>  549 void drop_slab_node(int nid)
>  550 {
>  551         unsigned long freed;
>  552
>  553         do {
>  554                 struct mem_cgroup *memcg = NULL;
>  555
>  556                 freed = 0;
>  557                 do {
>  558                         freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
>  559                 } while ((memcg = mem_cgroup_iter(NULL, memcg,
> NULL)) != NULL);
>  560         } while (freed > 10);
>  561 }
> 
> ..would it make sense to increase > 10 here with, for example, > 100 ?
> I could easily adjust this, or any other relevant threshold, since I
> am compiling the kernel in use.
> 
> I'd just like it to be able to finish dropping caches to achieve the
> workaround here until this issue is fixed,
> which as mentioned, can take hours on a busy host, causing the host to
> hang (having low performance) since
> buffers/caches are not used at that time while drop_caches is being
> set to 3, until that freeing up is finished.

This is worth a separate discussion. Please start a new email thread.

-- 
Michal Hocko
SUSE Labs