Re: [PATCH 1/5] vfs: vfs-level fiemap interface

Andreas Dilger <adilger@xxxxxxx> · Wed, 28 May 2008 13:42:15 -0600

On May 24, 2008  17:01 -0700, Mark Fasheh wrote:
> Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
> inode operation.

Mark, I was looking at a way to remove the special-casing of NUM_EXTENTS
from ioctl_fiemap() in an effort to remove Christoph's objection to
keeping these in the same ioctl.

I think it is possible and reasonable to move the special-case handling
into fiemap_fill_next_extent().

> +static int ioctl_fiemap(struct file *filp, unsigned long arg)
> +{
> +
> +	if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS) &&
> +	    (fiemap.fm_extent_count == 0 ||
> +	     fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS))
> +		return -EINVAL;

This can be changed to only check:

	if (fm_extent_count > FIEMAP_MAX_EXTENTS)
		return -EINVAL;

> +	fieinfo.fi_flags = fiemap.fm_flags;
> +	if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS)) {
> +		fieinfo.fi_extents_max = fiemap.fm_extent_count;
> +		fieinfo.fi_extents_start = (char *)arg + sizeof(fiemap);
> +
> +		if (!access_ok(VERIFY_WRITE, fieinfo.fi_extents_start,
> +			       fieinfo.fi_extents_max * sizeof(struct fiemap_extent)))
> +			return -EFAULT;
> +	}

It is harmless to set fi_extents_max and fi_extents_start, as this is
ignored by NUM_EXTENTS.  The fiemap_fill_next_extent() will already
check in copy_to_user() whether the fi_extents_start pointer is valid,
and fiemap_fill_next_extent() doesn't even get far enough to look at
fi_extents_max or fi_extents_start.  We just do:

	fieinfo.fi_extents = fiemap.fm_extent_count;
	fieinfo.fi_extents_start = (struct fiemap_extent *)((char *)arg +
							    sizeof(fiemap));

This leaves us with no checks for FIEMAP_FLAG_NUM_EXTENTS in ioctl_fiemap()
at all, and no changes needed in fiemap_fill_next_extent().

> > What about the idea to have fiemap_fill_next_extent() do "extent" merging
> > for filesystems that use the generic helper but do not return multiple
> > blocks via get_blocks()?  I don't think that is too hard to implement,
> > and makes the output more useful, otherwise we get an extent per block.  
> > The above is what I _think_ will work, haven't actually tried it out.
> 
> I don't think we want to automatically merge extents within this helper
> function. Otherwise we would diverge from the actual disk layout for extent
> based file systems where an extent might be broken up between two records
> for some other reason, such as maximum extent length being exceeded.

Do we really want to expose the filesystem-specific extent-length limits
to userspace?  In some sense, a block-based filesystem has a maximum
extent length of the blocksize, but it seems totally reasonable to merge
contiguous blocks into a single "extent" for return to userspace.  I
don't see this significantly different for ext4, even though it can have
extents up to 128MB, or unwritten extents up to 64MB.

> Btw, how many block-based file systems that don't return multiple blocks via
> get_blocks() are there that we actually care about enough to write this
> code?

That I have no clue about.  Joseph?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html