Re: [f2fs-dev] [PATCH 2/2] generic/066: add _require_metadata_replay

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 18 Mar 2015 14:33:53 +1100

On Fri, Feb 27, 2015 at 02:10:55PM +0100, Lukáš Czerner wrote:
> It's interesting, but it really applies only to metadata updates
> since really we normally only journal metadata. We do not
> consider extended attributes to be metadata, do we ?

Just to close the circle here, seeing as I don't think this was
answered: XFS considers all xattrs as metadata.

> > Yes, I'm considering xattrs as metadata (even though they can be seen
> > as data as well). This behaviour I'm testing for applies to ext3/4 and
> > xfs for example (and apparently intentional, since the test passes on
> > these filesystems).
> 
> Ok, I am confused. Clearly ext4, nor xfs consider xattrs metadata
> which can be tested simply by attaching xattr and crashing the file
> system immediately afterwards - the new xattr will not be there -
> that's expected for data, but unexpected for metadata.

It is expected of metadata if there was no fsync.

> Now the fact that it works might be just a coincidence. Btw in the
> discussion Dave never mentioned xattr, he only talks about inode
> size and extent list changes which makes sense since those are
> metadata and it's expected to be "stabilised" as he very well
> described. I just do not think this applies to this case.

xattrs are part of the journalled inode metadata in XFS, just like
the size and data extent tree.

> Also I think that his wording that fsync on the file implies fsync
> on the directory is unfortunate because it does not.

POSIX does not define how file/directory synchronisation should
work - it allows fsync() to be a complete no-op, so we are really on
our own here. i.e. we define the behaviour ourselves.

> However it
> implies that the directory will actually be stabilised as well due
> to journalling. But the results are the same.

Exactly - what I've described previously is based on the
transactional model that ext4, XFS and btrfs use - they all use a
strongly ordered atomic transaction model. That is, if we commit
transaction N to stable storage, we also commit N-1, N-2, ... and
N-m. i.e. we commit everything from the last synchronisation point
up to the current sync target.

That gives quite clear dependency rules to fsync. e.g:

	create file "X" in dir "Y" (tx N)
	write 1 byte to X	   (tx N+1)
	fsync X			   (force out tx N, N+1)

When fsync completes, we are guaranteeing that the application will
be able to find the byte we wrote to X. That also implies that
directory Y has a dirent that points to X, and that X has a file
size of 1 and and extent that points to the allocated block.

i.e. fsync() implies that all metadata needed to reference the data
that has been synced is present on disk. that means "fsync X" also
implies "fsync Y" because Y is the only way of finding X. However,
if we do this:

	create file "X" in dir "Y" (tx N)
	write 1 byte to X	   (tx N+1)
	add xattr to Y		   (tx N+2)
	fsync X			   (force out tx N, N+1)

the fsync of X is not guaranteed to stabilise "xattr Y" because that
change occurred *after* the dependency between X and Y was created
and is not required to be synced to resolve the dependency between X
and Y...

The devil is in the detail, but we really should see XFS, ext4 and
btrfs all provide the same fsync behaviour w.r.t. metadata and
fsync. Consistency is data integrity behaviour across different
filesystems is a good thing. :)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html