On Tue, Dec 03, 2013 at 10:01:01PM -0500, Johannes Weiner wrote: > On Tue, Dec 03, 2013 at 03:40:13PM -0800, David Rientjes wrote: > > On Tue, 3 Dec 2013, Johannes Weiner wrote: > > I believe the page allocator would be susceptible to the same deadlock if > > nothing else on the system can reclaim memory and that belief comes from > > code inspection that shows __GFP_NOFAIL is not guaranteed to ever succeed > > in the page allocator as their charges now are (with your patch) in memcg. > > I do not have an example of such an incident. > > Me neither. Is this the sort of thing that you expect to see when GFP_NOFS | GFP_NOFAIL type allocations continualy fail? http://oss.sgi.com/archives/xfs/2013-12/msg00095.html XFS doesn't use GFP_NOFAIL, it does it's own loop with GFP_NOWARN in kmem_alloc() so that if we get stuck for more than 100 attempts to allocate it throws a warning. i.e. only when we really are stuck and reclaim is not making any progress. This specific case is due to memory fragmentation preventing a 64k memory allocation (due to the filesystem being configured with a 64k directory block size), but GFP_NOFS | GFP_NOFAIL allocations happen *all the time* in filesystems. > > > > So, my question again: why not bypass the per-zone min watermarks in the > > > > page allocator? > > > > > > I don't even know what your argument is supposed to be. The fact that > > > we don't do it in the page allocator means that there can't be a bug > > > in memcg? > > > > > > > I'm asking if we should allow GFP_NOFS | __GFP_NOFAIL allocations in the > > page allocator to bypass per-zone min watermarks after reclaim has failed > > since the oom killer cannot be called in such a context so that the page > > allocator is not susceptible to the same deadlock without a complete > > depletion of memory reserves? > > Yes, I think so. There be dragons. If memcg's deadlock in low memory conditions in the presence of GFP_NOFS | GFP_NOFAIL allocations, then we need to make the memcg reclaim design more robust, not work around it by allowing filesystems to drain critical memory reserves needed for other situations.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html