Hi! > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > This is impossible to answer without knowing what was going on at the > > > time. Was the system threshing over page cache/swap? In other words, is > > > the system completely out of memory or refaulting the working set all > > > the time because it doesn't fit into memory? > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > that fairly often :-(. > > The oom heuristic is based on the reclaim failure. If the reclaim makes > some progress then the oom killer is not hit. Have a look at > should_reclaim_retry for more details. Thanks for pointer. I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd recommend? :-). > > PSI is completely different system, but I guess > > I should attempt to tweak the existing one first... > > PSI is measuring the cost of the allocation (among other things) and > that can give you some idea on how much time is spent to get memory. > Userspace can implement a policy based on that and act. The kernel oom > killer is the last resort when there is really no memory to > allocate. So what I'm seeing is system that is unresponsive, easily for an hour. Sometimes, I'm able to log in. When I could do that, system was absurdly slow, like ps printing at more than 10 seconds per line. ps on my system takes 300msec, estimate in the slow case would be 2000 seconds, that is slowdown by factor of 6000x. That would be X terminal opening in like two hours... that's not really usable. DRAM is in 100nsec range, disk is in 10msec range; so worst case slowdown is somewhere in 100000x range. (Actually, in the worst case userland will do no progress at all, since you can need at 4+ pages in single CPU instruction, right?) But kernel is happy; system is unusable and will stay unusable for hour or more, and there's not much user can do. (Besides sysrq, thanks for the hint). Can we do better? This is equivalent of system crash, and it is _way_ too easy to trigger. Should we do better by default? Dunno. If user moved the mouse, and cursor did not move for 10 seconds, perhaps it is time for oom kill? Or should I add more swap? Is it terrible to place swap on SSD? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Attachment:
signature.asc
Description: Digital signature