On Wed, Jul 01, 2015 at 08:10:14AM +0200, Michal Hocko wrote: > On Wed 01-07-15 08:58:51, Dave Chinner wrote: > [...] > > *blink* > > > > /me re-reads again > > > > That assumption is fundamentally broken. Filesystems use GFP_NOFS > > because the filesystem holds resources that can prevent memory > > reclaim making forwards progress if it re-enters the filesystem or > > blocks on anything filesystem related. memcg does not change that, > > and I'm kinda scared to learn that memcg plays fast and loose like > > this. > > > > For example: IO completion might require unwritten extent conversion > > which executes filesystem transactions and GFP_NOFS allocations. The > > writeback flag on the pages can not be cleared until unwritten > > extent conversion completes. Hence memory reclaim cannot wait on > > page writeback to complete in GFP_NOFS context because it is not > > safe to do so, memcg reclaim or otherwise. > > Thanks for the clarification. Perhaps we need to make the documentation a bit more explicit? All which is stated in include/slab.h: * %GFP_NOIO - Do not do any I/O at all while trying to get memory. * * %GFP_NOFS - Do not make any fs calls while trying to get memory. I thought this was obvious, but these flags are used by code which in the I/O or FS paths, and so it's always possible that they are trying to write back the page which you decide to blocking on when trying to do the memory allocation, at which point, *boom*, deadlock. So it's just not "do not make any FS or I/O calls", but also "the mm layer must not not wait for any FS or I/O operations from completing, since the operation you block on might be the one they were in the middle of trying to complete --- or they may be holding a lock at the time when they were trying to do a memory allocation which blocks the I/O or FS operation from completing". - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html