René J.V. Bertin - 03.09.18, 13:17: > On Monday September 03 2018 11:29:27 Martin Steigerwald wrote: > >memory and swap pressure. Until the kernel started to kick out > >processes > >with SIGKILL: > Ah, the OOM killer. I had a whole exchange about this kind of memory > management a couple of years back. I don't remember the details but > had a good reason to turn off the feature and just let *alloc calls > fail. Until I came across an app or two (possibly KWin) that don't > handle allocation failures. And that's the worse thing about this > kind of memory management IMHO: people start relying on it and stop > accounting for the possibility that allocations might fail. > > So you're saying the OOM killer didn't first kill the application that > made impossible request? How is that not wrong? How would it know which application this would be when mutiple processes allocate a chunk of memory each? If you have a bank and all people want their money back at the same time, how would you decide who will not get their money back? Developers changed OOM killer fundamentally in Linux kernel 2.6.36. Before that it tried to guess, but in a totally broken way. For example it looked for virtual memory. Now it does not guess. Its RSS + swap size and root processes receive 3% bonus and that is it for OOM score calculation. Oh, and you can adjust the score for processes manually. The first time I learned about this behavior of the Linux kernel I thought: WTF? AFAIR the Solaris kernel does not do that. I am not sure what BSD kernels like the one of FreeBSD or DragonFly BSD do. They all have virtual memory managers. They all have similar issues to deal with: That applications allocate (way) more virtual address space than they use later on. If you disable OOM, it can happen that you cannot start applications although you have way more physical memory and swap space free than what would be required to run it. atop shows this nicely in "SWP" line | vmcom 12.5G | vmlim 27.7G | currently on this ThinkPad T520. First is what the kernel promises to the applications, second is the limit as configured, by default half of the physical RAM + all of the swap space. What you can do to allow more memory allocations with the same amount of physical memory while still disabling OOM is: 1) Increase swap size. 2) Increase /proc/sys/vm/overcommit_ratio to maybe about 80 or 90 so that the kernel allows to allocate 80 or 90% of the physical memory even with overcommitting disabled completely. Even with that, as the ThinkPad T520 still had 8 GiB of RAM, I had it that in the second Plasma session I could not start another Firefox anymore, although the machine still had more than enough free physical memory and swap space. Just as an example how crazy this is: % ps --sort -vsz -axo pid,cmd,pmem,rss,vsz | head PID CMD %MEM RSS VSZ 2717 /usr/lib/x86_64-linux-gnu/l 0.9 161632 268808920 ^^^^ this is QtWebEngine: /usr/lib/x86_64-linux-gnu/qt5/libexec/ QtWebEngineProcess 26677 /usr/bin/baloo_file 0.0 9172 268803636 2337 /usr/bin/baloo_file 10.7 1742544 268754296 2891 /usr/bin/kmail -qwindowtitl 3.4 564420 6300680 14405 /usr/bin/amarok 0.8 130332 4659040 2343 /usr/bin/plasmashell 1.3 212184 4323664 2469 /usr/bin/akregator -session 1.5 256184 3624412 2335 /usr/bin/kwin_x11 -session 0.3 58536 3077096 19944 /usr/lib/firefox/firefox 3.8 618412 2780652 Do you see baloo_file? It allocated 268754296 KiB of virtual address space, that is about 256 GiB, but "just" 1742544 of physical memory (some of that shared with other processes!), that is about 1701 MiB. Still a lot if you ask me. I am not sure why it allocated that much physical memory on this machine. I have no idea how it was capable to do allo, as with stress I was not able to allocate 30 GiB of virtual address space in one go, but it appears those almost 256 GiB are even continuous address space, according to: % pmap -x 2337 2337: /usr/bin/baloo_file Address Kbytes RSS Dirty Mode Mapping […] 00007f5eb0000000 268435456 1643580 0 r--s- index […] I bet it may have allocated these in steps, but anyway, I never really understood the default heuristic of the Linux kernel regarding overcommit. And baloo is not the only one doing such crazy things. QtWebEngine did too. I also saw Java virtual machine / Java applications like to do that. Enough of that. Just in case you like to get rid of baloo file indexer you may like to enable strict overcommit. :) Ciao, -- Martin