Re: [PATCH 1/5] vfs: vfs-level fiemap interface

Andreas Dilger <adilger@xxxxxxx> · Thu, 29 May 2008 17:46:48 -0600

On May 28, 2008  10:24 -0700, Joel Becker wrote:
> On Wed, May 28, 2008 at 10:09:52AM -0600, Andreas Dilger wrote:
> > But the problem is that people are error prone in their updating of code,
> > and if the filesystems assume "the VFS has checked all of the flags except
> > this one I don't understand" will likely become incorrect over time as
> > someone will forget, will misunderstand whether the different per-fs codes
> > need to be updated, or some patch will be delayed in a FS maintainer queue
> > while the VFS "acceptance" of the new feature will be included upstream.
> 
> 	This is a specious argument - if it doesn't go upstream, we
> then have the overloaded-flag problem.

I was actually thinking of the opposite case - the VFS part of the new
flag is included upstream (i.e. ioctl_fiemap() allows the new flag),
but the filesystem-specific part is delayed by some maintainer (or lack
thereof).

We've had an ongoing issue with ext4 because we need EXPORT_SYMBOL(zero_page),
but this is not making it through the m68k maintainer yet the ext4 part of
the patch is already upstream and Andrew complains about it regularly.

> If you're looking for vendor flags, let's just design a space for them.

By no means am I looking for "private" flags or adding support for flags
that don't exist upstream (assuming it is reasonable to get new flags
upstream).  What I'm specifically concerned about is being able to support
new features that are properly accepted upstream in Lustre built against
older vendor kernels.  We are trying to get out of the kernel-patching
days because customers aren't willing to void their kernel or 3rd-party
application support by running a patched kernel on the client.

Since this is a relatively new API, I think several features like
FIEMAP_FLAG_XATTR, FIEMAP_FLAG_METADATA, and maybe a few others will be
added in the next several months, and some vendor will grab one of the
"has FIEMAP, but not all of the flags" kernels and we won't be able to
add newer features on that kernel for possibly several years.

> > The issue is that most users don't have the latest upstream kernel
> > because they are using a vendor kernel that is a few years old, as you
> > likely know, but an updated Lustre or OCFS2 or btrfs should work with
> > the existing vendor kernels.
> > 
> > If we wanted to add something like FIEMAP_FLAG_METADATA, if the check
> > was done in the VFS, it would be impossible without patching the client
> > even if it exactly matched the upstream kernel implementation.
> 
> 	First, getting vendor kernels to update a supported flag set
> that is already in mainline is pretty easy.  They are rightly interested
> in following a well-defined interface, which is what Mark's trying to do
> - no filesystems supporting flags that aren't part of the well-defined
> interface.

Reasonably so, yes.  The issue is that everyone is busy, and what may
be a priority for us isn't necessarily for the vendor, and there is
another hurdle trying to get the customer to upgrade the kernels on
their 10000-node cluster to add some bits to the compatibility flags.
Being able to add in e.g. FIEMAP_FLAG_XATTR ourselves is easier.

> 	But if you are really worried about no kernel updates when you
> install a new fs module, you can still solve it with a generic check.
> Just add /proc/sys/fs/fiemap-flag-mask.  This covers any new flags for
> the generic VFS check.  Alternately, allow filesystems to register their
> flags and then do the VFS check based on that.

If you are suggesting that the filesystems all export their "supported
flags" mask somewhere, and the VFS uses that for a check, then yes I agree
it would be possible to do.  I don't see a huge benefit of that over just
letting the filesystems do it directly themselves at that point.  Adding a
/proc or /sys or /debugfs tunable for this seems heavyweight, and needs
a sysctl or other setting on each boot - a pain for diskless clients.

It seems backward to me to add arbitrary limits to the API when it was
designed in the first place to be flexible and allow features to be
added easily.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html