Re: xfstests: generic/342 run failed in f2fs

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 28 Dec 2017 20:09:03 +1100

On Wed, Dec 27, 2017 at 11:11:30AM -0800, Jaegeuk Kim wrote:
> On 12/25, Theodore Ts'o wrote:
> > On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
> > > Filesystems are free to do /more/ than the minimum required by posix -
> > > see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
> > > mounts:
> > > 
> > >          * If this is a synchronous mount, make sure that the rename transaction
> > >          * goes to disk before returning to the user.
> > >          */
> > >         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
> > >                 xfs_trans_set_sync(tp);
> > > 
> > > so behavior can be fs-dependent, or mount option dependent, etc.
> > > 
> > > But to be portable, if an app wants directory changes to be persistent
> > > before proceeding, it must fsync the directory after making changes.
> > > 
> > > I don't know about f2fs's design intent, whether it guarantees more
> > > than posix requires, but to guarantee that this test works on any posix
> > > fs, I think that directory fsync is needed.
> > 
> > Agreed that this is a test bug, and we should add the fsync to the
> > parent directory.
> 
> Agreed too. Or, how about using "-o dirsync"?
> 
> > 
> > It might also be a good idea for f2fs to do more, given that fsync is
> > a slow enough operation that so long as you can make sure the fsync of
> > the parent directory happens within the same atomic update as the
> > child inode, you might as well give the more expansive guarantee.  But
> > obviously that's up to the f2fs developers to decide whether they want
> > to do that work.
> 
> Indeed. Actually, since one of our goals was to reduce fsync latencies for
> Android, we decided to support posix in a minimum way. In order to avoid
> complex directory updates recursively, however, we allowed the fsync on
> directories to trigger checkpoint requiring many IO operations.

So what you are really saying is that f2fs is not strictly ordered
w.r.t metadata crash consistency after fsync()? Wasn't that
considered a bug in btrfs that had to be fixed (and did get fixed)?

Oh, yeah, it's right there in the test commit history:

commit f02fe949113f35ae221ec1ab5c9959912f594bf4
Author: Filipe Manana <fdmanana@xxxxxxxx>
Date:   Tue Apr 5 11:47:55 2016 +1000

    generic: add test for fsync after renaming file

    Test that if we rename a file, create a new file that has the old name
    of the other file and is a child of the same parent directory, fsync the
    new inode, power fail and mount the filesystem, we do not lose the first
    file and that file has the name it was renamed to.

    This test is motivated by an issue found in btrfs which is fixed by the
    following patch for the linux kernel:

      "Btrfs: fix file loss caused by fsync after rename and new inode"

    Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
    Reviewed-by: Eryu Guan <eguan@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>

There's a whole lot more detail in the kernel commit 2be63d5ce929
("Btrfs: fix file loss on log replay after renaming a file and
fsync") but my point is that we considered this a btrfs filesystem
bug and so changing the test defeats it's purpose as a regression
test for the btrfs bug.

So IMO the test should not be changed. And I think we should be
consistent and consider this f2fs failure as a f2fs bug that needs
fixing to bring it's behaviour in line with xfs, ext4, and btrfs.

Remember this when quoting POSIX about fsync behaviour: Posix is a
terrible standard when it comes to data integrity. We go way, way
beyond what POSIX specifies as a valid fsync implementation (i.e.
posix allows "do nothing and return success" as a conformant
implementation). Ext4, XFS and btrfs all have strictly ordered
metadata crash recovery semantics and all of the crash recovery
tests expect this behaviour from the filesytem being tested. The
underlying intention is that by encoding it into these tests, all
widely used and future linux filesystems meet or exceed this crash
integrity requirement.

IOWs, changing the test is the wrong thing to do on many levels....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html