On 08/08/2019 19:59, Michal Hocko wrote:
Well, I am afraid that implementing anything like that in the kernel will lead to many regressions and bug reports. People tend to have very different opinions on when it is suitable to kill a potentially important part of a workload just because memory gets low.
Are you proposing having a zero memory reserve or not having such option at all? I'm fine with the current default (zero reserve/margin).
I strongly prefer forcing OOM killer when the system is still running normally. Not just for preventing stalls: in my limited testing I found the OOM killer on a stalled system rather inaccurate, occasionally killing system services etc. I had much better experience with earlyoom.
LRU aspect doesn't help much, really. If we are reclaiming the same set of pages becuase they are needed for the workload to operate then we are effectivelly treshing no matter what kind of replacement policy you are going to use.
In my case it would work fine (my system already works well with earlyoom, and without it it remains responsive until last couple hundred MB of RAM).
PSI is giving you a matric that tells you how much time you spend on the memory reclaim. So you can start watching the system from lower utilization already.
I've tested it on a system with 45GB of RAM, SSD, swap disabled (my intention was to approximate a worst-case scenario) and it didn't really detect stall before it happened. I can see some activity after reaching ~42GB, the system remains fully responsive until it suddenly freezes and requires sysrq-f. PSI appears to increase a bit when the system is about to run out of memory but the change is so small it would be difficult to set a reliable threshold. I expect the PSI numbers to increase significantly after the stall (I wasn't able to capture them) but, as mentioned above, I was hoping for a solution that would work before the stall.
$ while true; do sleep 1; cat /proc/pressure/memory ; done [starting a test script and waiting for several minutes to fill up memory] some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 some avg10=0.00 avg60=0.00 avg300=0.00 total=10389 full avg10=0.00 avg60=0.00 avg300=0.00 total=6442 some avg10=0.00 avg60=0.00 avg300=0.00 total=18950 full avg10=0.00 avg60=0.00 avg300=0.00 total=11576 some avg10=0.00 avg60=0.00 avg300=0.00 total=25655 full avg10=0.00 avg60=0.00 avg300=0.00 total=16159 some avg10=0.00 avg60=0.00 avg300=0.00 total=31438 full avg10=0.00 avg60=0.00 avg300=0.00 total=19552 some avg10=0.00 avg60=0.00 avg300=0.00 total=44549 full avg10=0.00 avg60=0.00 avg300=0.00 total=27772 some avg10=0.00 avg60=0.00 avg300=0.00 total=52520 full avg10=0.00 avg60=0.00 avg300=0.00 total=32580 some avg10=0.00 avg60=0.00 avg300=0.00 total=60451 full avg10=0.00 avg60=0.00 avg300=0.00 total=37704 some avg10=0.00 avg60=0.00 avg300=0.00 total=68986 full avg10=0.00 avg60=0.00 avg300=0.00 total=42859 some avg10=0.00 avg60=0.00 avg300=0.00 total=76598 full avg10=0.00 avg60=0.00 avg300=0.00 total=48370 some avg10=0.00 avg60=0.00 avg300=0.00 total=83080 full avg10=0.00 avg60=0.00 avg300=0.00 total=52930 some avg10=0.00 avg60=0.00 avg300=0.00 total=89384 full avg10=0.00 avg60=0.00 avg300=0.00 total=56350 some avg10=0.00 avg60=0.00 avg300=0.00 total=95293 full avg10=0.00 avg60=0.00 avg300=0.00 total=60260 some avg10=0.00 avg60=0.00 avg300=0.00 total=101566 full avg10=0.00 avg60=0.00 avg300=0.00 total=64408 some avg10=0.00 avg60=0.00 avg300=0.00 total=108131 full avg10=0.00 avg60=0.00 avg300=0.00 total=68412 some avg10=0.00 avg60=0.00 avg300=0.00 total=121932 full avg10=0.00 avg60=0.00 avg300=0.00 total=77413 some avg10=0.00 avg60=0.00 avg300=0.00 total=140807 full avg10=0.00 avg60=0.00 avg300=0.00 total=91269 some avg10=0.00 avg60=0.00 avg300=0.00 total=170494 full avg10=0.00 avg60=0.00 avg300=0.00 total=110611 [stall, sysrq-f] Best regards, ndrw