Re: [PATCH 0/4] Fiemap, an extent mapping ioctl - round 2

Jamie Lokier <jamie@xxxxxxxxxxxxx> · Fri, 4 Jul 2008 13:13:25 +0100

Dave Chinner wrote:
> The point of this SYNC flag is to ensure that you get nothing other
> than blocks mapped to disk - no delalloc regions, etc. The only sane
> way to do that is an atomic 'sync+map' operation. This is not a
> filesystem specific feature - it's what the SYNC flag should be
> defined as providing.

Wait a minute.

I think Jim, and you Dave, have imagined different use-cases
for FIEMAP - and that's the reason for this difference of opinion.

The two use-cases are:

    1. To get a detailed fragmentation report, which is guidance (and
       can only be guidance: it may be invalid the moment it's returned).

    2. To get a block mapping suitable for _reading_ those blocks from
       the physical device directly (e.g. LILO).

For 1, atomic 'sync+map' does make sense, if you want the report to
not have any delalloc extents, and you want to operate on files which
are being modified by other processes.

For 2, Jim appears to be correct that atomic 'sync+map' is not useful.
You can only read blocks if the mapping remains stable after returning
it, which means the application _must_ ensure no process is modifying
the file, and that it's on a filesystem which doesn't arbitrarily move
blocks when it feels like it.  Given that,
'make_sure_nothing_modifies; atomic(sync + map); read data;
ok_you_can_modify' is no different from 'make_sure_nothing_modifies;
fsync(); map; read data; ok_you_can_modify'.

> The fact that it's only implemented in XFS right now has absolutely
> *zero* consideration in determining this feature is necessary or
> not.

You're right, but that's not what Jim's arguing.  He's saying the
feature isn't necessary since it provides no _dependable_ semantic
guarantees, and therefore arguments to keep it are for legacy
compatibility alone.  That may be reason enough to keep it, though.

However, he's mistaken.  You've explained that it does provide a
guarantee: the resulting map will be valid for a consistent snapshot
of the file at some instant in time during the FIEMAP call.  In other
words, with concurrent modifiers, atomic sync+map ensures no delalloc
regions (is there anything else?) in the map, while fsync() + map gets
close but does not ensure it.  But either way, with concurrent
modifiers, you can only use the result for guidance, in a
fragmentation report, so is preventing delalloc regions actually
useful?  Maybe it is.  It would be good to see an example, though.

Dave, can you give an actual situation where you have seen atomic
'sync+map' used with XFS where it is necessary for an application to
behave correctly?  I'm having trouble thinking of one, other than "the
current app code doesn't know what to do with a delalloc extent".

I'm thinking when those programs are updated to use the new interface,
wouldn't it be better to update them to handle delalloc extents
(treating them as "region unknown" in fragmentation reports), because
some filesystems won't support atomic sync+map anyway?

Finally, if the real intent here is "the returned map for
fragmentation report shall not include any delalloc extents", perhaps
that should be the request flag instead?  There are other ways to
ensure that which don't require blocking concurrent modifications, for
potentially a significant time (esp. block-based filesystems).  We
like lock-free algorithms these days, if the results are suitable.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html