Sorry for late reply. Busy as always... On Mon 22-10-18 03:19:57, Marinko Catovic wrote: [...] > There we go again. > > First of all, I have set up this monitoring on 1 host, as a matter of > fact it did not occur on that single > one for days and weeks now, so I set this up again on all the hosts > and it just happened again on another one. > > This issue is far from over, even when upgrading to the latest 4.18.12 > > https://nofile.io/f/z2KeNwJSMDj/vmstat-2.zip > https://nofile.io/f/5ezPUkFWtnx/trace_pipe-2.gz I cannot download these. I am getting an invalid certificate and 403 when ignoring it [...] > Also, I'd like to ask for a workaround until this is fixed someday: > echo 3 > drop_caches can take a very > long time when the host is busy with I/O in the background. According > to some resources in the net I discovered > that dropping caches operates until some lower threshold is reached, > which is less and less likely, when the > host is really busy. Could one point out what threshold this is perhaps? > I was thinking of e.g. mm/vmscan.c > > 549 void drop_slab_node(int nid) > 550 { > 551 unsigned long freed; > 552 > 553 do { > 554 struct mem_cgroup *memcg = NULL; > 555 > 556 freed = 0; > 557 do { > 558 freed += shrink_slab(GFP_KERNEL, nid, memcg, 0); > 559 } while ((memcg = mem_cgroup_iter(NULL, memcg, > NULL)) != NULL); > 560 } while (freed > 10); > 561 } > > ..would it make sense to increase > 10 here with, for example, > 100 ? > I could easily adjust this, or any other relevant threshold, since I > am compiling the kernel in use. > > I'd just like it to be able to finish dropping caches to achieve the > workaround here until this issue is fixed, > which as mentioned, can take hours on a busy host, causing the host to > hang (having low performance) since > buffers/caches are not used at that time while drop_caches is being > set to 3, until that freeing up is finished. This is worth a separate discussion. Please start a new email thread. -- Michal Hocko SUSE Labs