Dmitry Monakhov <dmonakhov@xxxxxxxxxx> writes: > To prevent ext4_reservation vs dquot_reservation inconsistency, we have > to reorganize locking ordering like follows: > i_block_reservation_lock > dqptr_sem > This means what all functions which changes ext4 or quota reservation have > to hold i_block_reservation_lock. While investigating this issue i've considered other solution * Introduce i_block analog for generic reserved space management: We may introduce i_rsv_block field in generic inode, and protected it by i_lock(similar to i_block). Introduce inc/dec/set/get methods similar to inode_get_bytes, inode_sub_bytes.. . This value is managed internally by quota code. Perform reservation management inside dquot_reserve_space, dquot_release_reservation without interfering with fs internals, as we do for i_blocks. IMHO this is best way because: 1)We don't have to hold i_block_reservation_lock while quota-op which may lead to virtual performance penalty. 2)This brings to well defined VFS interface for reserved space management. But I expect some problems from AlViro because only ext4 would use it by now. And off course this may lead to ext4_rsv quot_rsv inconsistency due to some bugs. > > Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx> > --- > fs/ext4/inode.c | 23 ++++++++++++++++++----- > 1 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 84863e6..c521c93 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1047,16 +1047,23 @@ cleanup: > out: > return err; > } > +/* > + * Quota_transfer callback. > + * During quota transfer we have to transfer rsv-blocks from one dquot to > + * another. inode must be protected from concurrent reservation/reclamation. > + * Locking ordering for all space reservation code: > + * i_block_reservation_lock > dqptr_sem > + * This is differ from i_block,i_lock locking ordering, but this is the > + * only possible way. > + * NOTE: Caller must hold i_block_reservation_lock. > + */ > > qsize_t ext4_get_reserved_space(struct inode *inode) > { > unsigned long long total; > > - spin_lock(&EXT4_I(inode)->i_block_reservation_lock); > total = EXT4_I(inode)->i_reserved_data_blocks + > EXT4_I(inode)->i_reserved_meta_blocks; > - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > - > return (total << inode->i_blkbits); > } > /* > @@ -1131,6 +1138,8 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) > if (mdb_free) > vfs_dq_release_reservation_block(inode, mdb_free); > > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > + > /* > * If we have done all the pending block allocations and if > * there aren't any writers on the inode, we can discard the > @@ -1866,8 +1875,8 @@ repeat: > } > > if (ext4_claim_free_blocks(sbi, total)) { > - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > vfs_dq_release_reservation_block(inode, total); > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > if (ext4_should_retry_alloc(inode->i_sb, &retries)) { > yield(); > goto repeat; > @@ -1924,9 +1933,9 @@ static void ext4_da_release_space(struct inode *inode, int to_free) > > BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); > EXT4_I(inode)->i_reserved_meta_blocks = mdb; > - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > > vfs_dq_release_reservation_block(inode, release); > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > } > > static void ext4_da_page_release_reservation(struct page *page, > @@ -5436,7 +5445,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > error = PTR_ERR(handle); > goto err_out; > } > + /* i_block_reservation must being held in order to avoid races > + * with concurent block reservation. */ > + spin_lock(&EXT4_I(inode)->i_block_reservation_lock); > error = vfs_dq_transfer(inode, attr) ? -EDQUOT : 0; > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > if (error) { > ext4_journal_stop(handle); > return error; -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html