Re: [PATCH 3.14] ext4: fix data exposure after a crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 06-10-17 10:04:41, HUANG Weller (CM/ESW12-CN) wrote:
> Hello Mr. Davis and Mr. Kara,
> 
> I checked the latest 3.14 source code 3.14.79. I didn't found the below patch.
> 
> It should be there , right ?

Well, I'm not sure who runs the 3.14 stable tree (it's not listed at
kernel.org). It's up to him to pick up patches...

								Honza

> > -----Original Message-----
> > From: Jan Kara [mailto:jack@xxxxxxx]
> > Sent: Wednesday, June 29, 2016 3:46 PM
> > To: George G. Davis <george_davis@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx; Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>; Behme Dirk
> > (CM/ESO2) <Dirk.Behme@xxxxxxxxxxxx>; Jan Kara <jack@xxxxxxx>; Theodore
> > Ts'o <tytso@xxxxxxx>; HUANG Weller (CM/ESW12-CN)
> > <Weller.Huang@xxxxxxxxxxxx>
> > Subject: Re: [PATCH 3.14] ext4: fix data exposure after a crash
> > 
> > The backport looks good to me.
> > 
> > 								Honza
> > 
> > On Tue 28-06-16 18:31:48, George G. Davis wrote:
> > > From: Jan Kara <jack@xxxxxxx>
> > >
> > > From: Jan Kara <jack@xxxxxxx>
> > >
> > > commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream
> > >
> > > Huang has reported that in his powerfail testing he is seeing stale
> > > block contents in some of recently allocated blocks although he mounts
> > > ext4 in data=ordered mode. After some investigation I have found out
> > > that indeed when delayed allocation is used, we don't add inode to
> > > transaction's list of inodes needing flushing before commit.
> > > Originally we were doing that but commit f3b59291a69d removed the
> > > logic with a flawed argument that it is not needed.
> > >
> > > The problem is that although for delayed allocated blocks we write
> > > their contents immediately after allocating them, there is no
> > > guarantee that the IO scheduler or device doesn't reorder things and
> > > thus transaction allocating blocks and attaching them to inode can
> > > reach stable storage before actual block contents. Actually whenever
> > > we attach freshly allocated blocks to inode using a written extent, we
> > > should add inode to transaction's ordered inode list to make sure we
> > > properly wait for block contents to be written before committing the
> > > transaction. So that is what we do in this patch. This also handles
> > > other cases where stale data exposure was possible - like filling hole
> > > via mmap in data=ordered,nodelalloc mode.
> > >
> > > The only exception to the above rule are extending direct IO writes
> > > where
> > > blkdev_direct_IO() waits for IO to complete before increasing i_size
> > > and thus stale data exposure is not possible. For now we don't
> > > complicate the code with optimizing this special case since the
> > > overhead is pretty low. In case this is observed to be a performance
> > > problem we can always handle it using a special flag to ext4_map_blocks().
> > >
> > > Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d
> > > Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@xxxxxxxxxxxx>
> > > Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@xxxxxxxxxxxx>
> > > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > > Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> > > [weller: fix conflict with 3.14 kernel]
> > > Signed-off-by: weller huang <weller.huang@xxxxxxxxxxxx>
> > > Signed-off-by: George G. Davis <george_davis@xxxxxxxxxx>
> > > ---
> > > gdavis: Confirmed that backport conflicts are due to lack of upstream
> > > 	commits c86d8db ("ext4: implement allocation of pre-zeroed
> > > 	blocks") and 09cbfea ("mm, fs: get rid of PAGE_CACHE_* and
> > > 	page_cache_{get,release} macros") in v3.14.37. The conflict
> > > 	resolution therefore appears to be correct.
> > > ---
> > >  fs/ext4/inode.c | 23 ++++++++++++++---------
> > >  1 file changed, 14 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 58001fc..d33a80e
> > > 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -695,6 +695,20 @@ has_zeroout:
> > >  		int ret = check_block_validity(inode, map);
> > >  		if (ret != 0)
> > >  			return ret;
> > > +
> > > +		/*
> > > +		 * Inodes with freshly allocated blocks where contents will be
> > > +		 * visible after transaction commit must be on transaction's
> > > +		 * ordered data list.
> > > +		 */
> > > +		if (map->m_flags & EXT4_MAP_NEW &&
> > > +		    !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
> > > +		    !IS_NOQUOTA(inode) &&
> > > +		    ext4_should_order_data(inode)) {
> > > +			ret = ext4_jbd2_file_inode(handle, inode);
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > >  	}
> > >  	return retval;
> > >  }
> > > @@ -1059,15 +1073,6 @@ static int ext4_write_end(struct file *file,
> > >  	int i_size_changed = 0;
> > >
> > >  	trace_ext4_write_end(inode, pos, len, copied);
> > > -	if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE)) {
> > > -		ret = ext4_jbd2_file_inode(handle, inode);
> > > -		if (ret) {
> > > -			unlock_page(page);
> > > -			page_cache_release(page);
> > > -			goto errout;
> > > -		}
> > > -	}
> > > -
> > >  	if (ext4_has_inline_data(inode)) {
> > >  		ret = ext4_write_inline_data_end(inode, pos, len,
> > >  						 copied, page);
> > > --
> > > 1.9.3
> > >
> > --
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]