> Indeed. From the point of view of the OS, it's running everything on > the system without a problem. It's deep into swap, but it's running. Watchdogs can help here > If there are application requirements on grade-of-service, it's up to > the application to check whether those are being met and if not to do > something about it. Or it can request such a level of service from the kernel using the various memory control interfaces provided but not enabled by distributors in default configurations. In particular you can tell the kernel to stop the system hitting the point where it runs near to out of memory + swap and begins to thrash horribly. For many workloads you will need a lot of pretty much excess swap, but disk is cheap. It's like banking, you can either pretend it's safe in which case you do impressions of the US banking system now and then and the government has to reboot it, or you can do traditional banking models where you have a reserve which is sufficient to cover the worst case of making progress. Our zero overcommit isn't specifically aimed at the page rate problem but is sufficiently related it usually does the trick. http://opsmonkey.blogspot.com/2007/01/linux-memory-overcommit.html I would btw disagree strongly that this is a 'sorry we can't help' situation. Back when memory was scarce and systems habitually ran at high memory loads 4.2 and 4.3BSD coped just fine with very high fault rates that make modern systems curl up and die. That was entirely down to having good paging and swap policies linked to scheduling behaviour so they always made progress. Your latency went through the roof but work got done which meant that if it was transient load the system would feel like treacle then perk up again where these days it seems the fashion of most OS's to just explode messily. In particular they did two things - Actively tried to swap out all the bits of entire process victims to make space to do work under very high load - When a process was pulled in it got time to run before it as opposed to someone else got dumped out That has two good effects. Firstly the system could write out the process data very efficiently and get it back likewise. Secondly the system ended up in a kick one out, do work in the space we have to breath, stop, kick next out, do work, and in most cases had little CPU contention so could make good progress in each burst, albeit with the high latency cost. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>