Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

Jan Kara <jack@xxxxxxx> · Wed, 29 Mar 2017 13:15:07 +0200

On Tue 21-03-17 14:46:53, Jeff Layton wrote:
> On Tue, 2017-03-21 at 14:30 -0400, J. Bruce Fields wrote:
> > On Tue, Mar 21, 2017 at 01:23:24PM -0400, Jeff Layton wrote:
> > > On Tue, 2017-03-21 at 12:30 -0400, J. Bruce Fields wrote:
> > > > - It's durable; the above comparison still works if there were reboots
> > > >   between the two i_version checks.
> > > > 	- I don't know how realistic this is--we may need to figure out
> > > > 	  if there's a weaker guarantee that's still useful.  Do
> > > > 	  filesystems actually make ctime/mtime/i_version changes
> > > > 	  atomically with the changes that caused them?  What if a
> > > > 	  change attribute is exposed to an NFS client but doesn't make
> > > > 	  it to disk, and then that value is reused after reboot?
> > > > 
> > > 
> > > Yeah, there could be atomicity there. If we bump i_version, we'll mark
> > > the inode dirty and I think that will end up with the new i_version at
> > > least being journalled before __mark_inode_dirty returns.
> > 
> > So you think the filesystem can provide the atomicity?  In more detail:
> > 
> 
> Sorry, I hit send too quickly. That should have read:
> 
> "Yeah, there could be atomicity issues there."
> 
> I think providing that level of atomicity may be difficult, though
> maybe there's some way to make the querying of i_version block until
> the inode update has been journalled?

Just to complement what Dave said from ext4 side - similarly as with XFS
ext4 doesn't guarantee atomicity unless fsync() has completed on the file.
Until that you can see arbitrary combination of data & i_version after the
crash. We do take care to keep data and metadata in sync only when there
are security implications to that (like exposing uninitialized disk blocks)
and if not, we are as lazy as we can to improve performance...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR