On Mon, 20 Jul 2020 at 05:04, Kevin Kofler <kevin.kofler@xxxxxxxxx> wrote: John M. Harris Jr wrote:
Userspace isn't dead when a system is thrashing. Your software is still running. If it gets killed, you're most likely going to lose your data.
The thing is, there are various levels of thrashing. In some cases, the system is so busy that you have no chance to bring it back to responsiveness for many minutes, up to hours. (Other than hitting the Reset or Power button, of course.) I have had cases where not even sshd would respond. (The fact that login has been blocking on D-Bus since the introduction of systemd-logind does not help either. Login timeouts are something that was just never happening in the past, now they are common under heavy load.)
That said, I do not see how the EarlyOOM heuristic, which allows, depending on the exact settings, something like 80-90% of swap to be used IN ADDITION to 90+% RAM (and will only start doing anything if BOTH RAM and swap are full) can prevent thrashing in any reliable way. My thrashing scenarios have had much less swap than that used. (I have twice as much swap than RAM, so when the EarlyOOM heuristics trigger, my programs are already trying to use almost 3 times as much RAM as is actually available!)
I think the problem is that you are using way too much swap for modernsystems. The reasons for swap being 2x to 4x real memory was a 1980'ssolution when big RAM systems had 64 MB of ram but a server might need128MB for certain tasks. This was 'reasonable' because the processorswere slow but could still walk through 128 MB of space 'pretty' fast.As RAM got larger this 2x became 'cargo-culted' in variousdocumentation and was still reasonable while processor speed went up.You could still have a system with 512MB of ram and walk through 1024to 2048 GB of swap in similar times as the 128 MB.
Also affected by the way some UNIXes handled commit and fork (the things Linux heuristic overcommit was designed to avoid needing excess swap for). In the best case, you needed as much swap as RAM, so that a full size process (64 MB in your 1980s system) could allocate an extra 64 MB to fork; in the worst case, things were using that swap and you needed some multiple just for commit accounting.
IIRC, there were also some systems where the available space for committing was set to the size of swap - hence the need for 2x RAM. One for matching your RAM and allowing you to use it, one for the space consumed during a fork.
Linux's use of overcommit means that this isn't an issue for Fedora, though. -- Simon |