> one host is at a healthy state right now, I'd run that over there immediately.
Let's see what we can get from here.
oh well, that went fast. actually with having low values for buffers (around 100MB) with caches
around 20G or so, the performance was nevertheless super-low, I really had to drop
the caches right now. This is the first time I see it with caches >10G happening, but hopefully
this also provides a clue for you.
Just after starting the stats I reset from previously defer to madvise - I suspect that this somehow
caused the rapid reaction, since a few minutes later I saw that the free RAM jumped from 5GB to 10GB,
after that I went afk, returning to the pc since my monitoring systems went crazy telling me about downtime.
If you think changing /sys/kernel/mm/transparent_hugepage/defrag back to its default, while it was
on defer now for days, was a mistake, then please tell me.
here you go: https://nofile.io/f/VqRg644AT01/vmstat.tar.gz
trace_pipe: https://nofile.io/f/wFShvZScpvn/trace_pipe.gz