Aneesh Kumar K.V wrote: > On Mon, Jan 04, 2010 at 05:08:55PM -0600, Eric Sandeen wrote: > >> Eric Sandeen wrote: >> >>> Theodore Ts'o wrote: >>> >>>> One of the things which has been annoying me for a while now is a >>>> hard-to-reproduce xfsqa failure in test #13 (fsstress), which causes the >>>> a test failure because the file system found to be inconsistent: >>>> >>>> Inode NNN, i_blocks is X, should be Y. >>>> >>> Interesting, this apparently has gotten much worse since 2.6.32. >>> >>> I wrote an xfstests reproducer, and couldn't hit it on .32; hit it right >>> off on 2.6.33-rc2. >>> >>> Probably should find out why ;) I'll go take a look. >>> >> commit d21cd8f163ac44b15c465aab7306db931c606908 >> Author: Dmitry Monakhov <dmonakhov@xxxxxxxxxx> >> Date: Thu Dec 10 03:31:45 2009 +0000 >> >> ext4: Fix potential quota deadlock >> >> seems to be the culprit. >> >> (unfortunately this means that the error we saw before is something >> -else- to be fixed, yet) Anyway ... >> >> This is because we used to do this in ext4_mb_mark_diskspace_used() : >> >> /* >> * Now reduce the dirty block count also. Should not go negative >> */ >> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) >> /* release all the reserved blocks if non delalloc */ >> percpu_counter_sub(&sbi->s_dirtyblocks_counter, >> reserv_blks); >> else { >> percpu_counter_sub(&sbi->s_dirtyblocks_counter, >> ac->ac_b_ex.fe_len); >> /* convert reserved quota blocks to real quota blocks */ >> vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len); >> } >> >> i.e. the vfs_dq_claim_block was conditional based on >> EXT4_MB_DELALLOC_RESERVED... and the testcase did not go that way, >> because we had already preallocated the blocks. >> >> But with the above quota deadlock commit it's not unconditional >> anymore in ext4_da_update_reserve_space and we always call >> vfs_dq_claim_block which over-accounts. >> >> > > It is still conditional right ? We call ext4_da_update_reserve_space > only if EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE is set . That will > happen only in case of delayed allocation. I guess the problem is > same as what Ted stated. But i am not sure why we are able to reproduce > it much easily on 2.6.33-rc2. > > Maybe something like this works: Index: linux-2.6/fs/ext4/inode.c =================================================================== --- linux-2.6.orig/fs/ext4/inode.c +++ linux-2.6/fs/ext4/inode.c @@ -1203,6 +1203,7 @@ int ext4_get_blocks(handle_t *handle, st int flags) { int retval; + int was_unwritten; clear_buffer_mapped(bh); clear_buffer_unwritten(bh); @@ -1253,9 +1254,13 @@ int ext4_get_blocks(handle_t *handle, st * part of the uninitialized extent to be an initialized * extent. This is because we need to avoid the combination * of BH_Unwritten and BH_Mapped flags being simultaneously - * set on the buffer_head. + * set on the buffer_head. However, if it was unwritten we + * don't want to update reserved space later. */ - clear_buffer_unwritten(bh); + if (buffer_unwritten(bh)) { + was_unwritten = 1; + clear_buffer_unwritten(bh); + } /* * New blocks allocate and/or writing to uninitialized extent @@ -1301,7 +1306,8 @@ int ext4_get_blocks(handle_t *handle, st * Update reserved blocks/metadata blocks after successful * block allocation which had been deferred till now. */ - if ((retval > 0) && (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)) + if ((retval > 0) && !was_unwritten && + (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)) ext4_da_update_reserve_space(inode, retval); up_write((&EXT4_I(inode)->i_data_sem)); but that might leave the previous reservations hanging around from prior to the fallocate ... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html