Re: [PATCH 1/5] vfs: vfs-level fiemap interface

Andreas Dilger <adilger@xxxxxxx> · Wed, 28 May 2008 10:09:52 -0600

On May 27, 2008  11:31 -0700, Mark Fasheh wrote:
> > > +	/*
> > > +	 * The VFS does basic sanity checks on the flag
> > > +	 * value. Individual file systems can also reject otherwise
> > > +	 * 'valid' flags by returning -EBADR from ->fiemap.
> > > +	 */
> > > +	incompat_flags = fiemap.fm_flags & ~FIEMAP_FLAGS_COMPAT;
> > > +	if (incompat_flags)
> > > +		goto out_bad_flags;
> > 
> > Please remove this, and leave the checking for the individual filesystems.
> > They all need to do it anyways, because they can't depend on the check here
> > never changing or always being updated when this check does.
> 
> I hope that when we add new feature flags to fiemap, we take the time to
> do it right and add it to FIEMAP_FLAGS_COMPAT, which is defined in the same
> header for that reason. Likewise, we can't just add flags in any case
> without considering how they affect existing users of ->fiemap. If we add a
> flag that everyone can implement for example, we'll have to touch all file
> systems anyway.

Well, the problem is that older kernel will not have the most current
upstream FIEMAP_FLAGS_COMPAT, and short of patching the kernel to change
this ONE value (which we've worked for a long time to not do on the
client) there is no way that support for additional upstream flags can
be added.

Because the generic ioctl handling is done before the filesystem-specific
one (sys_ioctl->vfs_ioctl->file_ioctl->do_ioctl->(f_op->ioctl()), we can't
override the FIEMAP ioctl handling in the filesystem, regardless of what
we do with the ->fiemap() method.

> > At best it is redundant with the check in the filesystem, at worst it
> > will lead to bugs in the filesystem-specific code due to inconsistent
> > checks.
> 
> I makes sense to me that the VFS should verify that flag values passed
> are valid in that they've been defined by the kernel <-> user interface. The
> file systems can and should still reject flags which they can not support.
>
> So the checks are *designed* to not be consistent, but rather allow the
> client file system to whittle down the supported flags further.

But the problem is that people are error prone in their updating of code,
and if the filesystems assume "the VFS has checked all of the flags except
this one I don't understand" will likely become incorrect over time as
someone will forget, will misunderstand whether the different per-fs codes
need to be updated, or some patch will be delayed in a FS maintainer queue
while the VFS "acceptance" of the new feature will be included upstream.

I don't think it is safe for filesystems to depend on the VFS check, and
if the filesystems need to do their own complete check then the VFS doesn't
need to do it at all.  There isn't really much that the VFS cares about
the fm_flags field, with NUM_EXTENTS being the only notable exception.
The proposed FIEMAP_FLAG_METADATA is again something that the VFS doesn't
care about.  Even then, since the filesystems would do their own check
and would return -EBADR there isn't anything bad that could happen.

> Otherwise, it seems to me that client file systems could start to define
> their own flags which might eventually stomp on those we'd like to add
> later.

It would be foolish to do so, as long as it isn't impossible to get new
flags defined in the upstream kernel.

The issue is that most users don't have the latest upstream kernel
because they are using a vendor kernel that is a few years old, as you
likely know, but an updated Lustre or OCFS2 or btrfs should work with
the existing vendor kernels.

If we wanted to add something like FIEMAP_FLAG_METADATA, if the check
was done in the VFS, it would be impossible without patching the client
even if it exactly matched the upstream kernel implementation.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html