Re: [PATCH] ext[34] EA block reference count racing fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 21 Feb 2007 15:54:10 -0800 Mingming Cao <cmm@xxxxxxxxxx> wrote:

> There are race issues around ext[34] xattr block release code.
> 
> ext[34]_xattr_release_block() checks the reference count of xattr block
> (h_refcount) and frees that xattr block if it is the last one reference
> it. Unlike ext2, the check of this counter is unprotected by any lock.
> ext[34]_xattr_release_block() will free the mb_cache entry before
> freeing that xattr block. There is a small window between the check for
> the re h_refcount ==1 and the call to mb_cache_entry_free(). During this
> small window another inode might find this xattr block from the mbcache
> and reuse it, racing a refcount updates.  The xattr block will later be
> freed by the first inode without notice other inode is still use it.
> Later if that block is reallocated as a datablock for other file, then
> more serious problem might happen.
> 
> We need put a lock around places checking the refount as well to avoid
> racing issue. Another place need this kind of protection is in
> ext3_xattr_block_set(), where it will modify the xattr block content in-
> the-fly if the refcount is 1 (means it's the only inode reference it).
> 
> This will also fix another issue: the xattr block may not get freed at
> all if no lock is to protect the refcount check at the release time. It
> is possible that the last two inodes could release the shared xattr
> block at the same time. But both of them think they are not the last one
> so only decreased the h_refcount without freeing xattr block at all.
> 
> We need to call lock_buffer() after ext3_journal_get_write_access() to
> avoid deadlock (because the later will call lock_buffer()/unlock_buffer
> () as well).
> 
> 
> Signed-Off-By: Mingming Cao <cmm@xxxxxxxxxx>
> ---
> 
>  linux-2.6.19-ming/fs/ext3/xattr.c |   42 +++++++++++++++++++++++---------------
>  linux-2.6.19-ming/fs/ext4/xattr.c |   41 ++++++++++++++++++++++---------------
>  2 files changed, 51 insertions(+), 32 deletions(-)
> 
> diff -puN fs/ext3/xattr.c~ext34_xattr_release_block_race_fix fs/ext3/xattr.c
> --- linux-2.6.19/fs/ext3/xattr.c~ext34_xattr_release_block_race_fix	2007-02-14 16:40:48.000000000 -0800
> +++ linux-2.6.19-ming/fs/ext3/xattr.c	2007-02-21 11:45:10.000000000 -0800
> @@ -478,8 +478,15 @@ ext3_xattr_release_block(handle_t *handl
>  			 struct buffer_head *bh)
>  {
>  	struct mb_cache_entry *ce = NULL;
> +	int error = 0;
>  
>  	ce = mb_cache_entry_get(ext3_xattr_cache, bh->b_bdev, bh->b_blocknr);
> +	error = ext3_journal_get_write_access(handle, bh);
> +	if (error)
> +		 goto out;
> +
> +	lock_buffer(bh);
> +
>  	if (BHDR(bh)->h_refcount == cpu_to_le32(1)) {
>  		ea_bdebug(bh, "refcount now=0; freeing");
>  		if (ce)
> @@ -488,21 +495,20 @@ ext3_xattr_release_block(handle_t *handl
>  		get_bh(bh);
>  		ext3_forget(handle, 1, inode, bh, bh->b_blocknr);
>  	} else {
> -		if (ext3_journal_get_write_access(handle, bh) == 0) {
> -			lock_buffer(bh);
> -			BHDR(bh)->h_refcount = cpu_to_le32(
> +		BHDR(bh)->h_refcount = cpu_to_le32(
>  				le32_to_cpu(BHDR(bh)->h_refcount) - 1);
> -			ext3_journal_dirty_metadata(handle, bh);
> -			if (IS_SYNC(inode))
> -				handle->h_sync = 1;
> -			DQUOT_FREE_BLOCK(inode, 1);
> -			unlock_buffer(bh);
> -			ea_bdebug(bh, "refcount now=%d; releasing",
> -				  le32_to_cpu(BHDR(bh)->h_refcount));
> -		}
> +		error = ext3_journal_dirty_metadata(handle, bh);
> +		handle->h_sync = 1;
> +		DQUOT_FREE_BLOCK(inode, 1);
> +		ea_bdebug(bh, "refcount now=%d; releasing",
> +			  le32_to_cpu(BHDR(bh)->h_refcount));
>  		if (ce)
>  			mb_cache_entry_release(ce);
>  	}
> +	unlock_buffer(bh);
> +out:
> +	ext3_std_error(inode->i_sb, error);
> +	return;
>  }
>  
>  struct ext3_xattr_info {
> @@ -678,7 +684,7 @@ ext3_xattr_block_set(handle_t *handle, s
>  	struct buffer_head *new_bh = NULL;
>  	struct ext3_xattr_search *s = &bs->s;
>  	struct mb_cache_entry *ce = NULL;
> -	int error;
> +	int error = 0;
>  
>  #define header(x) ((struct ext3_xattr_header *)(x))
>  
> @@ -687,16 +693,17 @@ ext3_xattr_block_set(handle_t *handle, s
>  	if (s->base) {
>  		ce = mb_cache_entry_get(ext3_xattr_cache, bs->bh->b_bdev,
>  					bs->bh->b_blocknr);
> +		error = ext3_journal_get_write_access(handle, bs->bh);
> +		if (error)
> +			goto cleanup;
> +		lock_buffer(bs->bh);
> +
>  		if (header(s->base)->h_refcount == cpu_to_le32(1)) {
>  			if (ce) {
>  				mb_cache_entry_free(ce);
>  				ce = NULL;
>  			}
>  			ea_bdebug(bs->bh, "modifying in-place");
> -			error = ext3_journal_get_write_access(handle, bs->bh);
> -			if (error)
> -				goto cleanup;
> -			lock_buffer(bs->bh);
>  			error = ext3_xattr_set_entry(i, s);
>  			if (!error) {
>  				if (!IS_LAST_ENTRY(s->first))
> @@ -716,6 +723,9 @@ ext3_xattr_block_set(handle_t *handle, s
>  		} else {
>  			int offset = (char *)s->here - bs->bh->b_data;
>  
> +			unlock_buffer(bs->bh);
> +			journal_release_buffer(handle, bs->bh);
> +
>  			if (ce) {
>  				mb_cache_entry_release(ce);
>  				ce = NULL;

This patch has destroyed ext3 performance - a `poppatch 200' with
everything in pagecache has gone from 4.3 seconds up to 21.4 seconds, which
casts a pall upon my normally cheery disposition.


It's been in there for three weeks and nobody has noticed, which is
amazing, in a worrisome way.  Perhaps EAs are only used when using SELinux,
and everyone turns off SELinux.  Or perhaps everyone is using ext4.  Or
perhaps nobody is testing mainline.


--- a/fs/ext3/xattr.c~ext-ea-block-reference-count-racing-fix-performance-fix
+++ a/fs/ext3/xattr.c
@@ -495,7 +495,8 @@ ext3_xattr_release_block(handle_t *handl
 		BHDR(bh)->h_refcount = cpu_to_le32(
 				le32_to_cpu(BHDR(bh)->h_refcount) - 1);
 		error = ext3_journal_dirty_metadata(handle, bh);
-		handle->h_sync = 1;
+		if (IS_SYNC(inode))
+			handle->h_sync = 1;
 		DQUOT_FREE_BLOCK(inode, 1);
 		ea_bdebug(bh, "refcount now=%d; releasing",
 			  le32_to_cpu(BHDR(bh)->h_refcount));
_

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux