Re: [f2fs-dev] [PATCH] generic/066: attr1 is still there after log replay on f2fs

Sun Ke <sunke32@xxxxxxxxxx> · Fri, 11 Mar 2022 15:14:04 +0800

在 2022/3/10 15:33, Chao Yu 写道:
On 2022/3/10 9:41, Dave Chinner wrote:
On Wed, Mar 09, 2022 at 03:34:27PM +0800, Chao Yu wrote:
On 2022/3/9 14:22, Dave Chinner wrote:
On Wed, Mar 09, 2022 at 12:31:17PM +0800, Chao Yu wrote:
On 2022/2/28 11:57, Sun Ke via Linux-f2fs-devel wrote:
The test fail on f2fs:
        xattr names and values after second fsync log replay:
        # file: SCRATCH_MNT/foobar
       +user.attr1="val1"
        user.attr3="val3"

attr1 is still there after log replay.
I guess it is f2fs's special feature to improve the performance.

Signed-off-by: Sun Ke <sunke32@xxxxxxxxxx>
---

Is it a BUG on f2fs?

I don't think so, it fails due to f2fs doesn't follow recovery rule 
which
btrfs/ext4/xfs does, but it doesn't mean f2fs has break posix 
semantics of
fsync().

I disagree.  A failure in this test is indicative of non-conformance
with the Linux definition of fsync() behaviour.

You are right in that it does not break POSIX fsync semantics, but
POSIX allows "do nothing" as a valid implementation. However,
because of this loophole, the POSIX definition is complete garbage
and we do not use it.

That behaviour that Linux filesytsems are supposed to implement is
defined in the Linux fsync() man page, and it goes way beyond what
POSIX requires:

$ man fsync
....
DESCRIPTION
      fsync() transfers ("flushes") all modified in-core data of
      (i.e., modified buffer cache pages for) the file referred to by
      the file descriptor fd to the disk device (or other permanent
      storage device) so that all changed information can be retrieved
      even if the  system  crashes  or  is rebooted.  This includes
      writing through or flushing a disk cache if present.  The call
      blocks until the device reports that the transfer has
      completed.

      As well as flushing the file data, fsync() also flushes the
      metadata information associated with the file (see inode(7)).
....

IOWs, fsync() on Linux is supposed to persist all data and
metadata associated with the inode to stable storage such that it
can be retreived after a crash or reboot. "metadata information"
includes xattrs attached to the inode that is being fsync()d.

Quoted from g/066:

echo "hello world!" >> $SCRATCH_MNT/foobar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
$SETFATTR_PROG -x user.attr1 $SCRATCH_MNT/foobar
ln $SCRATCH_MNT/foobar $SCRATCH_MNT/foobar_link
touch $SCRATCH_MNT/qwerty
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/qwerty

IIUC, to match what Linux fsync() manual restricts, if we want to 
persist the
xattr removal, we should call fsync() on $SCRATCH_MNT/foobar after
"$SETFATTR_PROG -x user.attr1 $SCRATCH_MNT/foobar"? rather than 
calling fsync()
on unrelated $SCRATCH_MNT/qwerty.

It might look that way, but it's not that straight forward: there's
a carefully constructed object dependency chain in this test that
defines what the correct behaviour should be here.

What's the link count of $SCRATCH_MNT/foobar when
$SCRATCH_MNT/qwerty is present after recovery? Is it 1 or 2?  Does
$SCRATCH_MNT/foobar_link exist?  And if $SCRATCH_MNT/foobar_link
exists, and the link count is 2. The test doesn't even look at these
things, but if user.attr1 is not present, it means that foobar_link
and qwerty are present, $SCRATCH_MNT has a link count of 5 and
foobar has a link count of 2 because that's the dependency chain
that leads to the user.attr1 removal being recovered correctly.

So what does SCRATCH_MNT actually contain when f2fs fails this test?

After f2fs recovery,

SCRATCH_MNT contains two files: foobar and qwerty, link count of both
files is 1, and foobar has two xattr entries: user.attr1 and user.attr3.

So it means, f2fs only recover file/directory which has been fsync()ed 
before
SPO... since f2fs doesn't support fs-op level transaction functionality, 
so it
have no way to persist all metadata updates in one transaction.

There is one alternative method to pass this case, as I suggested, we can
use "fastboot" mountoption for this case, so during last fsync on qwerty,
f2fs can trigger a checkpoint which will persist all metadata updates 
before
fsync()...

Thanks,

The test can pass by using "fastboot" mountoption. I will send v2.

Thanks,
Sun Ke

These depedencies exist because we can't just randomly re-order
recovery of modifications to individual inodes and certain
operations create atomic change dependencies between inodes. It's
those atomic change dependencies that this test is actually
exercising.  i.e. the link count changes tie directory modifications
to inode modifications and this creates cross-object ordering
dependencies down the line that fsync then exposes. That's what the
second part of this test is actually exercising....

Cheers,

Dave.
.