Re: [f2fs-dev] [PATCH] generic/066: attr1 is still there after log replay on f2fs

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Mar 09, 2022 at 03:34:27PM +0800, Chao Yu wrote:
> On 2022/3/9 14:22, Dave Chinner wrote:
> > On Wed, Mar 09, 2022 at 12:31:17PM +0800, Chao Yu wrote:
> > > On 2022/2/28 11:57, Sun Ke via Linux-f2fs-devel wrote:
> > > > The test fail on f2fs:
> > > >        xattr names and values after second fsync log replay:
> > > >        # file: SCRATCH_MNT/foobar
> > > >       +user.attr1="val1"
> > > >        user.attr3="val3"
> > > > 
> > > > attr1 is still there after log replay.
> > > > I guess it is f2fs's special feature to improve the performance.
> > > > 
> > > > Signed-off-by: Sun Ke <sunke32@xxxxxxxxxx>
> > > > ---
> > > > 
> > > > Is it a BUG on f2fs?
> > > 
> > > I don't think so, it fails due to f2fs doesn't follow recovery rule which
> > > btrfs/ext4/xfs does, but it doesn't mean f2fs has break posix semantics of
> > > fsync().
> > 
> > I disagree.  A failure in this test is indicative of non-conformance
> > with the Linux definition of fsync() behaviour.
> > 
> > You are right in that it does not break POSIX fsync semantics, but
> > POSIX allows "do nothing" as a valid implementation. However,
> > because of this loophole, the POSIX definition is complete garbage
> > and we do not use it.
> > 
> > That behaviour that Linux filesytsems are supposed to implement is
> > defined in the Linux fsync() man page, and it goes way beyond what
> > POSIX requires:
> > 
> > $ man fsync
> > ....
> > DESCRIPTION
> >      fsync() transfers ("flushes") all modified in-core data of
> >      (i.e., modified buffer cache pages for) the file referred to by
> >      the file descriptor fd to the disk device (or other permanent
> >      storage device) so that all changed information can be retrieved
> >      even if the  system  crashes  or  is rebooted.  This includes
> >      writing through or flushing a disk cache if present.  The call
> >      blocks until the device reports that the transfer has
> >      completed.
> > 
> >      As well as flushing the file data, fsync() also flushes the
> >      metadata information associated with the file (see inode(7)).
> > ....
> > 
> > IOWs, fsync() on Linux is supposed to persist all data and
> > metadata associated with the inode to stable storage such that it
> > can be retreived after a crash or reboot. "metadata information"
> > includes xattrs attached to the inode that is being fsync()d.
> 
> Quoted from g/066:
> 
> echo "hello world!" >> $SCRATCH_MNT/foobar
> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
> $SETFATTR_PROG -x user.attr1 $SCRATCH_MNT/foobar
> ln $SCRATCH_MNT/foobar $SCRATCH_MNT/foobar_link
> touch $SCRATCH_MNT/qwerty
> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/qwerty
> 
> IIUC, to match what Linux fsync() manual restricts, if we want to persist the
> xattr removal, we should call fsync() on $SCRATCH_MNT/foobar after
> "$SETFATTR_PROG -x user.attr1 $SCRATCH_MNT/foobar"? rather than calling fsync()
> on unrelated $SCRATCH_MNT/qwerty.

It might look that way, but it's not that straight forward: there's
a carefully constructed object dependency chain in this test that
defines what the correct behaviour should be here.

What's the link count of $SCRATCH_MNT/foobar when
$SCRATCH_MNT/qwerty is present after recovery? Is it 1 or 2?  Does
$SCRATCH_MNT/foobar_link exist?  And if $SCRATCH_MNT/foobar_link
exists, and the link count is 2. The test doesn't even look at these
things, but if user.attr1 is not present, it means that foobar_link
and qwerty are present, $SCRATCH_MNT has a link count of 5 and
foobar has a link count of 2 because that's the dependency chain
that leads to the user.attr1 removal being recovered correctly.

So what does SCRATCH_MNT actually contain when f2fs fails this test?

These depedencies exist because we can't just randomly re-order
recovery of modifications to individual inodes and certain
operations create atomic change dependencies between inodes. It's
those atomic change dependencies that this test is actually
exercising.  i.e. the link count changes tie directory modifications
to inode modifications and this creates cross-object ordering
dependencies down the line that fsync then exposes. That's what the
second part of this test is actually exercising....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux