Re: [PATCH 05/10] ext4: Fix race when reusing xattr blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22/07/12 12:54PM, Jan Kara wrote:
> When ext4_xattr_block_set() decides to remove xattr block the following
> race can happen:
>
> CPU1                                    CPU2
> ext4_xattr_block_set()                  ext4_xattr_release_block()
>   new_bh = ext4_xattr_block_cache_find()
>
>                                           lock_buffer(bh);
>                                           ref = le32_to_cpu(BHDR(bh)->h_refcount);
>                                           if (ref == 1) {
>                                             ...
>                                             mb_cache_entry_delete();
>                                             unlock_buffer(bh);
>                                             ext4_free_blocks();
>                                               ...
>                                               ext4_forget(..., bh, ...);
>                                                 jbd2_journal_revoke(..., bh);
>
>   ext4_journal_get_write_access(..., new_bh, ...)
>     do_get_write_access()
>       jbd2_journal_cancel_revoke(..., new_bh);
>
> Later the code in ext4_xattr_block_set() finds out the block got freed
> and cancels reusal of the block but the revoke stays canceled and so in
> case of block reuse and journal replay the filesystem can get corrupted.
> If the race works out slightly differently, we can also hit assertions
> in the jbd2 code.
>
> Fix the problem by making sure that once matching mbcache entry is
> found, code dropping the last xattr block reference (or trying to modify
> xattr block in place) waits until the mbcache entry reference is
> dropped. This way code trying to reuse xattr block is protected from
> someone trying to drop the last reference to xattr block.
>
> Reported-and-tested-by: Ritesh Harjani <ritesh.list@xxxxxxxxx>
> CC: stable@xxxxxxxxxxxxxxx
> Fixes: 82939d7999df ("ext4: convert to mbcache2")
> Signed-off-by: Jan Kara <jack@xxxxxxx>

Thanks Jan,
Just a note - I retested the patches only till here (marked stable) with
stress-ng --xattr 16.
And I didn't find any issues so far for ext2, ext3, ext4 default mkfs options.

Also I re-ran full v3 patch series with the same test case on all 3 filesystem,
and I didn't find any failures of the same test case.

-ritesh




> ---
>  fs/ext4/xattr.c | 67 +++++++++++++++++++++++++++++++++----------------
>  1 file changed, 45 insertions(+), 22 deletions(-)
>
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index aadfae53d055..3a0928c8720e 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -439,9 +439,16 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
>  /* Remove entry from mbcache when EA inode is getting evicted */
>  void ext4_evict_ea_inode(struct inode *inode)
>  {
> -	if (EA_INODE_CACHE(inode))
> -		mb_cache_entry_delete(EA_INODE_CACHE(inode),
> -			ext4_xattr_inode_get_hash(inode), inode->i_ino);
> +	struct mb_cache_entry *oe;
> +
> +	if (!EA_INODE_CACHE(inode))
> +		return;
> +	/* Wait for entry to get unused so that we can remove it */
> +	while ((oe = mb_cache_entry_delete_or_get(EA_INODE_CACHE(inode),
> +			ext4_xattr_inode_get_hash(inode), inode->i_ino))) {
> +		mb_cache_entry_wait_unused(oe);
> +		mb_cache_entry_put(EA_INODE_CACHE(inode), oe);
> +	}
>  }
>
>  static int
> @@ -1229,6 +1236,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
>  	if (error)
>  		goto out;
>
> +retry_ref:
>  	lock_buffer(bh);
>  	hash = le32_to_cpu(BHDR(bh)->h_hash);
>  	ref = le32_to_cpu(BHDR(bh)->h_refcount);
> @@ -1238,9 +1246,18 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
>  		 * This must happen under buffer lock for
>  		 * ext4_xattr_block_set() to reliably detect freed block
>  		 */
> -		if (ea_block_cache)
> -			mb_cache_entry_delete(ea_block_cache, hash,
> -					      bh->b_blocknr);
> +		if (ea_block_cache) {
> +			struct mb_cache_entry *oe;
> +
> +			oe = mb_cache_entry_delete_or_get(ea_block_cache, hash,
> +							  bh->b_blocknr);
> +			if (oe) {
> +				unlock_buffer(bh);
> +				mb_cache_entry_wait_unused(oe);
> +				mb_cache_entry_put(ea_block_cache, oe);
> +				goto retry_ref;
> +			}
> +		}
>  		get_bh(bh);
>  		unlock_buffer(bh);
>
> @@ -1867,9 +1884,20 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
>  			 * ext4_xattr_block_set() to reliably detect modified
>  			 * block
>  			 */
> -			if (ea_block_cache)
> -				mb_cache_entry_delete(ea_block_cache, hash,
> -						      bs->bh->b_blocknr);
> +			if (ea_block_cache) {
> +				struct mb_cache_entry *oe;
> +
> +				oe = mb_cache_entry_delete_or_get(ea_block_cache,
> +					hash, bs->bh->b_blocknr);
> +				if (oe) {
> +					/*
> +					 * Xattr block is getting reused. Leave
> +					 * it alone.
> +					 */
> +					mb_cache_entry_put(ea_block_cache, oe);
> +					goto clone_block;
> +				}
> +			}
>  			ea_bdebug(bs->bh, "modifying in-place");
>  			error = ext4_xattr_set_entry(i, s, handle, inode,
>  						     true /* is_block */);
> @@ -1885,6 +1913,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
>  				goto cleanup;
>  			goto inserted;
>  		}
> +clone_block:
>  		unlock_buffer(bs->bh);
>  		ea_bdebug(bs->bh, "cloning");
>  		s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
> @@ -1991,18 +2020,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
>  				lock_buffer(new_bh);
>  				/*
>  				 * We have to be careful about races with
> -				 * freeing, rehashing or adding references to
> -				 * xattr block. Once we hold buffer lock xattr
> -				 * block's state is stable so we can check
> -				 * whether the block got freed / rehashed or
> -				 * not.  Since we unhash mbcache entry under
> -				 * buffer lock when freeing / rehashing xattr
> -				 * block, checking whether entry is still
> -				 * hashed is reliable. Same rules hold for
> -				 * e_reusable handling.
> +				 * adding references to xattr block. Once we
> +				 * hold buffer lock xattr block's state is
> +				 * stable so we can check the additional
> +				 * reference fits.
>  				 */
> -				if (hlist_bl_unhashed(&ce->e_hash_list) ||
> -				    !ce->e_reusable) {
> +				ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
> +				if (ref > EXT4_XATTR_REFCOUNT_MAX) {
>  					/*
>  					 * Undo everything and check mbcache
>  					 * again.
> @@ -2017,9 +2041,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
>  					new_bh = NULL;
>  					goto inserted;
>  				}
> -				ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
>  				BHDR(new_bh)->h_refcount = cpu_to_le32(ref);
> -				if (ref >= EXT4_XATTR_REFCOUNT_MAX)
> +				if (ref == EXT4_XATTR_REFCOUNT_MAX)
>  					ce->e_reusable = 0;
>  				ea_bdebug(new_bh, "reusing; refcount now=%d",
>  					  ref);
> --
> 2.35.3
>



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux