On Fri, Feb 20, 2015 at 01:48:49PM +0100, Michal Hocko wrote: > On Fri 20-02-15 08:43:56, Dave Chinner wrote: > > On Thu, Feb 19, 2015 at 01:29:14PM +0100, Michal Hocko wrote: > > > On Thu 19-02-15 06:01:24, Johannes Weiner wrote: > > > [...] > > > > Preferrably, we'd get rid of all nofail allocations and replace them > > > > with preallocated reserves. But this is not going to happen anytime > > > > soon, so what other option do we have than resolving this on the OOM > > > > killer side? > > > > > > As I've mentioned in other email, we might give GFP_NOFAIL allocator > > > access to memory reserves (by giving it __GFP_HIGH). > > > > Won't work when you have thousands of concurrent transactions > > running in XFS and they are all doing GFP_NOFAIL allocations. > > Is there any bound on how many transactions can run at the same time? Yes. As many reservations that can fit in the available log space. The log can be sized up to 2GB, and for filesystems larger than 4TB will default to 2GB. Log space reservations depend on the operation being done - an inode timestamp update requires about 5kB of reservation, and rename requires about 200kB. Hence we can easily have thousands of active transactions, even in the worst case log space reversation cases. You're saying it would be insane to have hundreds or thousands of threads doing GFP_NOFAIL allocations concurrently. Reality check: XFS has been operating successfully under such workload conditions in production systems for many years. > > That's why I suggested the per-transaction reserve pool - we can use > > that > > I am still not sure what you mean by reserve pool (API wise). How > does it differ from pre-allocating memory before the "may not fail > context"? Could you elaborate on it, please? It is preallocating memory: into a reserve pool associated with the transaction, done as part of the transaction reservation mechanism we already have in XFS. The allocator then uses that reserve pool to allocate from if an allocation would otherwise fail. There is no way we can preallocate specific objects before the transaction - that's just insane, especially handling the unbound demand paged object requirement. Hence the need for a "preallocated reserve pool" that the allocator can dip into that covers the memory we need to *allocate and can't reclaim* during the course of the transaction. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>