On Wed 05-08-15 11:51:21, mhocko@xxxxxxxxxx wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" > memory allocator doesn't endlessly loop to satisfy low-order allocations > and instead fails them to allow callers to handle them gracefully. > > Some of the callers are not yet prepared for this behavior though. ext4 > block allocator relies solely on GFP_NOFS allocation requests and > allocation failures lead to aborting yournal too easily: > > [ 345.028333] oom-trash: page allocation failure: order:0, mode:0x50 > [ 345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G W 4.0.0-nofs3-00006-gdfe9931f5f68 #588 > [ 345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014 > [ 345.028339] 0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f > [ 345.028341] 0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000 > [ 345.028342] 0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246 > [ 345.028343] Call Trace: > [ 345.028348] [<ffffffff81538a54>] dump_stack+0x4f/0x7b > [ 345.028370] [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f > [ 345.028373] [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa > [ 345.028375] [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9 > [ 345.028390] [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4] > [ 345.028414] [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4] > [ 345.028425] [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4] > [ 345.028434] [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4] > [ 345.028441] [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4] > [ 345.028462] [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4] > [ 345.028464] [<ffffffff8116d04f>] evict+0xa0/0x148 > [ 345.028466] [<ffffffff8116dca8>] iput+0x1a1/0x1f0 > [ 345.028468] [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6 > [ 345.028470] [<ffffffff81169a3e>] dput+0x21a/0x243 > [ 345.028472] [<ffffffff81157cda>] __fput+0x184/0x19b > [ 345.028473] [<ffffffff81157d29>] ____fput+0xe/0x10 > [ 345.028475] [<ffffffff8105a05f>] task_work_run+0x8a/0xa1 > [ 345.028477] [<ffffffff810452f0>] do_exit+0x3c6/0x8dc > [ 345.028482] [<ffffffff8104588a>] do_group_exit+0x4d/0xb2 > [ 345.028483] [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5 > [ 345.028488] [<ffffffff81002202>] do_signal+0x28/0x5d0 > [...] > [ 345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory > [ 345.033097] Aborting journal on device hdb1-8. > [ 345.036339] EXT4-fs (hdb1): Remounting filesystem read-only > [ 345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted > [ 345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted > [ 345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted > [ 345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted > [ 345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted > [ 345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted > [ 345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted > [ 345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted > [ 345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted > > The failure is really premature because GFP_NOFS allocation context is > very restricted - especially in the fs metadata heavy loads. Before we > go with a more sofisticated solution, let's simply imitate the previous > behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the > buddy block allocator. I wasn't able to trigger the issue with this > patch anymore. The patch looks good. You can add: Reviewed-by: Jan Kara <jack@xxxxxxxx> Honza > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > --- > fs/ext4/mballoc.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 5b1613a54307..e6361622bfd5 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb, > block = group * 2; > pnum = block / blocks_per_page; > poff = block % blocks_per_page; > - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); > + page = find_or_create_page(inode->i_mapping, pnum, > + GFP_NOFS|__GFP_NOFAIL); > if (!page) > return -ENOMEM; > BUG_ON(page->mapping != inode->i_mapping); > @@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb, > > block++; > pnum = block / blocks_per_page; > - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); > + page = find_or_create_page(inode->i_mapping, pnum, > + GFP_NOFS|__GFP_NOFAIL); > if (!page) > return -ENOMEM; > BUG_ON(page->mapping != inode->i_mapping); > @@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, > * wait for it to initialize. > */ > page_cache_release(page); > - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); > + page = find_or_create_page(inode->i_mapping, pnum, > + GFP_NOFS|__GFP_NOFAIL); > if (page) { > BUG_ON(page->mapping != inode->i_mapping); > if (!PageUptodate(page)) { > @@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, > if (page == NULL || !PageUptodate(page)) { > if (page) > page_cache_release(page); > - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); > + page = find_or_create_page(inode->i_mapping, pnum, > + GFP_NOFS|__GFP_NOFAIL); > if (page) { > BUG_ON(page->mapping != inode->i_mapping); > if (!PageUptodate(page)) { > -- > 2.5.0 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html