Re: [PATCH 3.14] ext4: fix data exposure after a crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The backport looks good to me.

								Honza

On Tue 28-06-16 18:31:48, George G. Davis wrote:
> From: Jan Kara <jack@xxxxxxx>
> 
> From: Jan Kara <jack@xxxxxxx>
> 
> commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream
> 
> Huang has reported that in his powerfail testing he is seeing stale
> block contents in some of recently allocated blocks although he mounts
> ext4 in data=ordered mode. After some investigation I have found out
> that indeed when delayed allocation is used, we don't add inode to
> transaction's list of inodes needing flushing before commit. Originally
> we were doing that but commit f3b59291a69d removed the logic with a
> flawed argument that it is not needed.
> 
> The problem is that although for delayed allocated blocks we write their
> contents immediately after allocating them, there is no guarantee that
> the IO scheduler or device doesn't reorder things and thus transaction
> allocating blocks and attaching them to inode can reach stable storage
> before actual block contents. Actually whenever we attach freshly
> allocated blocks to inode using a written extent, we should add inode to
> transaction's ordered inode list to make sure we properly wait for block
> contents to be written before committing the transaction. So that is
> what we do in this patch. This also handles other cases where stale data
> exposure was possible - like filling hole via mmap in
> data=ordered,nodelalloc mode.
> 
> The only exception to the above rule are extending direct IO writes where
> blkdev_direct_IO() waits for IO to complete before increasing i_size and
> thus stale data exposure is not possible. For now we don't complicate
> the code with optimizing this special case since the overhead is pretty
> low. In case this is observed to be a performance problem we can always
> handle it using a special flag to ext4_map_blocks().
> 
> Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d
> Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@xxxxxxxxxxxx>
> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@xxxxxxxxxxxx>
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> [weller: fix conflict with 3.14 kernel]
> Signed-off-by: weller huang <weller.huang@xxxxxxxxxxxx>
> Signed-off-by: George G. Davis <george_davis@xxxxxxxxxx>
> ---
> gdavis: Confirmed that backport conflicts are due to lack of upstream
> 	commits c86d8db ("ext4: implement allocation of pre-zeroed
> 	blocks") and 09cbfea ("mm, fs: get rid of PAGE_CACHE_* and
> 	page_cache_{get,release} macros") in v3.14.37. The conflict
> 	resolution therefore appears to be correct.
> ---
>  fs/ext4/inode.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 58001fc..d33a80e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -695,6 +695,20 @@ has_zeroout:
>  		int ret = check_block_validity(inode, map);
>  		if (ret != 0)
>  			return ret;
> +
> +		/*
> +		 * Inodes with freshly allocated blocks where contents will be
> +		 * visible after transaction commit must be on transaction's
> +		 * ordered data list.
> +		 */
> +		if (map->m_flags & EXT4_MAP_NEW &&
> +		    !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
> +		    !IS_NOQUOTA(inode) &&
> +		    ext4_should_order_data(inode)) {
> +			ret = ext4_jbd2_file_inode(handle, inode);
> +			if (ret)
> +				return ret;
> +		}
>  	}
>  	return retval;
>  }
> @@ -1059,15 +1073,6 @@ static int ext4_write_end(struct file *file,
>  	int i_size_changed = 0;
>  
>  	trace_ext4_write_end(inode, pos, len, copied);
> -	if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE)) {
> -		ret = ext4_jbd2_file_inode(handle, inode);
> -		if (ret) {
> -			unlock_page(page);
> -			page_cache_release(page);
> -			goto errout;
> -		}
> -	}
> -
>  	if (ext4_has_inline_data(inode)) {
>  		ret = ext4_write_inline_data_end(inode, pos, len,
>  						 copied, page);
> -- 
> 1.9.3
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]