On Mon, Feb 23, 2015 at 11:45:21AM +1100, Dave Chinner wrote: > On Sat, Feb 21, 2015 at 06:52:27PM -0500, Johannes Weiner wrote: > > On Fri, Feb 20, 2015 at 09:52:17AM +1100, Dave Chinner wrote: > > > I will actively work around aanything that causes filesystem memory > > > pressure to increase the chance of oom killer invocations. The OOM > > > killer is not a solution - it is, by definition, a loose cannon and > > > so we should be reducing dependencies on it. > > > > Once we have a better-working alternative, sure. > > Great, but first a simple request: please stop writing code and > instead start architecting a solution to the problem. i.e. we need a > design and have that documented before code gets written. If you > watched my recent LCA talk, then you'll understand what I mean > when I say: stop programming and start engineering. This code was for the sake of argument, see below. > > > I really don't care about the OOM Killer corner cases - it's > > > completely the wrong way line of development to be spending time on > > > and you aren't going to convince me otherwise. The OOM killer a > > > crutch used to justify having a memory allocation subsystem that > > > can't provide forward progress guarantee mechanisms to callers that > > > need it. > > > > We can provide this. Are all these callers able to preallocate? > > Anything that allocates in transaction context (and therefor is > GFP_NOFS by definition) can preallocate at transaction reservation > time. However, preallocation is dumb, complex, CPU and memory > intensive and will have a *massive* impact on performance. > Allocating 10-100 pages to a reserve which we will almost *never > use* and then free them again *on every single transaction* is a lot > of unnecessary additional fast path overhead. Hence a "preallocate > for every context" reserve pool is not a viable solution. You are missing the point of my question. Whether we allocate right away or make sure the memory is allocatable later on is a matter of cost, but the logical outcome is the same. That is not my concern right now. An OOM killer allows transactional allocation sites to get away without planning ahead. You are arguing that the OOM killer is a cop-out on the MM site but I see it as the opposite: it puts a lot of complexity in the allocator so that callsites can maneuver themselves into situations where they absolutely need to get memory - or corrupt user data - without actually making sure their needs will be covered. If we replace __GFP_NOFAIL + OOM killer with a reserve system, we are putting the full responsibility on the user. Are you sure this is going to reduce our kernel-wide error rate? > And, really, "reservation" != "preallocation". That's an implementation detail. Yes, the example implementation was dumb and heavy-handed, but a reservation system that works based on watermarks, and considers clean cache readily allocatable, is not much more complex than that. I'm trying to figure out if the current nofail allocators can get their memory needs figured out beforehand. And reliably so - what good are estimates that are right 90% of the time, when failing the allocation means corrupting user data? What is the contingency plan? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>