Re: fsstress-induced corruption reproduced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 04, 2010 at 05:08:55PM -0600, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Theodore Ts'o wrote:
> >> One of the things which has been annoying me for a while now is a
> >> hard-to-reproduce xfsqa failure in test #13 (fsstress), which causes the
> >> a test failure because the file system found to be inconsistent:
> >>
> >> Inode NNN, i_blocks is X, should be Y.
> > 
> > Interesting, this apparently has gotten much worse since 2.6.32.
> > 
> > I wrote an xfstests reproducer, and couldn't hit it on .32; hit it right
> > off on 2.6.33-rc2.
> > 
> > Probably should find out why ;) I'll go take a look.
> 
> commit d21cd8f163ac44b15c465aab7306db931c606908
> Author: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
> Date:   Thu Dec 10 03:31:45 2009 +0000
> 
>     ext4: Fix potential quota deadlock
> 
> seems to be the culprit.
> 
> (unfortunately this means that the error we saw before is something
> -else- to be fixed, yet)  Anyway ...
> 
> This is because we used to do this in ext4_mb_mark_diskspace_used() :
> 
>         /*
>          * Now reduce the dirty block count also. Should not go negative
>          */
>         if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
>                 /* release all the reserved blocks if non delalloc */
>                 percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> reserv_blks);
>         else {
>                 percpu_counter_sub(&sbi->s_dirtyblocks_counter,
>                                                 ac->ac_b_ex.fe_len);
>                 /* convert reserved quota blocks to real quota blocks */
>                 vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
> 	}
> 
> i.e. the vfs_dq_claim_block was conditional based on
> EXT4_MB_DELALLOC_RESERVED... and the testcase did not go that way,
> because we had already preallocated the blocks.
> 
> But with the above quota deadlock commit it's not unconditional
> anymore in ext4_da_update_reserve_space and we always call
> vfs_dq_claim_block which over-accounts.
> 

It is still conditional right ? We call ext4_da_update_reserve_space
only if EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE  is set . That will
happen only in case of delayed allocation. I guess the problem is
same as what Ted stated. But i am not sure why we are able to reproduce
it much easily on 2.6.33-rc2.


> Of course with the above commit, we have no allocation context in
> ext4_da_update_reserve_space... that's all long gone so we can't key
> on that anymore.
> 
> However, I think the following change will fix it; I'll run it through
> xfstests later on and be sure nothing else regresses.
> 
> -Eric
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5352db1..28cd8d8 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1257,9 +1257,10 @@ int ext4_get_blocks(handle_t *handle, struct
> inode *inode, sector_t block,
>          * if the caller is from delayed allocation writeout path
>          * we have already reserved fs blocks for allocation
>          * let the underlying get_block() function know to
> -        * avoid double accounting
> +        * avoid double accounting.  Ditto for prealloc blocks.
>          */
> -       if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> +       if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ||
> +           flags & EXT4_GET_BLOCKS_UNINIT_EXT)
>                 EXT4_I(inode)->i_delalloc_reserved_flag = 1;
>         /*
>          * We need to check for EXT4 here because migrate
> 


But we need to do quota update during fallocate call. Doing the above
will result we not doing that. We would will also get block accounting
wrong because we now won't be doing ext4_claim_free_blocks for fallocate.

I guess what we need is to make sure that if we have any buffer_head mapping
the same block range allocated via fallocate and if they are marked BH_Delay
we need to clear the delay flag and update the block reservation. Later during
writepage we will find these buffer_heads mapped/non-delay and will do the right thing.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux