Re: [PATCH 1/5] vfs: vfs-level fiemap interface

Andreas Dilger <adilger@xxxxxxx> · Sun, 25 May 2008 01:28:16 -0600

On May 24, 2008  17:01 -0700, Mark Fasheh wrote:
> Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
> inode operation.

Mark, this appears to be missing most of my previous comments.  Are
you getting my emails, or is there still a problem?

> +++ b/Documentation/filesystems/fiemap.txt
> +A fiemap request is encoded within struct fiemap:
> +
> +struct fiemap {
> +	__u64	fm_start;	 /* logical offset (inclusive) at
> +				  * which to start mapping (in) */
> +	__u64	fm_length;	 /* logical length of mapping which
> +				  * userspace cares about (in) */
> +	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in) */

This should be marked (in/out) because of error return, documented below.

> +* FIEMAP_EXTENT_UNKNOWN
> +The location of this extent is currently unknown. This may indicate
> +the data is stored on an inaccessible volume or that no storage has
> +been allocated for the file yet.
> +
> +* FIEMAP_EXTENT_SECONDARY
> +  - This will also set FIEMAP_EXTENT_UNKNOWN.
> +The data for this extent is in secondary storage.

This is not correct that it will always set FIEMAP_EXTENT_UNKNOWN here.
It is possible (common) for data to be stored on both primary AND
secondary storage at the same time.  It is still desirable to mark the
data as FIEMAP_EXTENT_SECONDARY even if it is still resident in the
filesystem, because this informs the user that the HSM/backup system
has made a copy of this file already.

> +For each extent in the request range, the file system should call
> +the helper function, fiemap_fill_next_extent():
> +
> +int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> +			    u64 phys, u64 len, u32 flags, u32 lun);
> +
> +fiemap_fill_next_extent() will use the passed values to populate the
> +next free extent in the fm_extents array. 'General' extent flags will
> +automatically be set from specific flags on behalf of the calling file
> +system so that the userspace API is not broken.
> +
> +fiemap_fill_next_extent() returns 0 on success, and 1 when the
> +user-supplied fm_extents array is full. If an error is encountered
> +while copying the extent to user memory, -EFAULT will be returned.
> +
> +If the request has the FIEMAP_FLAG_NUM_EXTENTS flag set, then calling
> +this helper is not necessary and fi_extents_mapped can be set
> +directly.

It is worthwhile to mention here that if FIEMAP_FLAG_NUM_EXTENTS IS set,
it is also OK for the filesystem to call fiemap_fill_next_extent(), and
only the number of extents will be counted, avoiding the need to special-
case this flag in the filesystem implementation.

What about the idea to have fiemap_fill_next_extent() do "extent" merging
for filesystems that use the generic helper but do not return multiple
blocks via get_blocks()?  I don't think that is too hard to implement,
and makes the output much more useful, otherwise we get an extent per block.
The above is what I _think_ will work, haven't actually tried it out.

struct fiemap_extnent_info {
	struct fiemap_extent	fi_last;	/* Copy of previous extent */
	unsigned int	fi_flags;		/* Flags as passed from user */
	unsigned int	fi_extents_mapped;      /* Number of mapped extents */
	unsigned int	fi_extents_max;		/* fiemap_extent array size */
	char            *fi_extents_start;	/* fiemap_extent array start */
}

int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo,
			    u64 logical, u64 phys, u64 len, u32 flags, u32 lun)
{
	struct fiemap_extent *extent = &fieinfo->fi_extent_last;
	int joined = 1;

	if (flags & SET_UNKNOWN_FLAGS)
		flags |= FIEMAP_EXTENT_UNKNOWN;
	if (flags & SET_NO_DIRECT_FLAGS)
		flags |= FIEMAP_EXTENT_NO_DIRECT;
	if (flags & SET_NOT_ALIGNED_FLAGS)
		flags |= FIEMAP_EXTENT_NOT_ALIGNED;

	/* If this is an extension of the previous extent, add it in */
	if (fieinfo->fi_extents_mapped > 0 &&
	    extent->fe_logical + extent->fe_length == logical &&
	    extent->fe_physical + extent->fe_length == phys &&
	    extent->fe_flags == (flags & ~FIEMAP_FLAG_LAST) &&
	    extent->fe_lun == lun) {
	    	extent->fe_length += len;
		extent->fe_flags = flags;	/* maybe set FIEMAP_FLAG_LAST */

		if (!(fieinfo->fi_flags & FIEMAP_FLAG_NUM_EXTENTS))
			fieinfo->fe_extents_start -= sizeof(*extent);
		fieinfo->fi_extents_mapped--;

		joined = 1;
	}

	if (fieinfo->fi_flags & FIEMAP_FLAG_NUM_EXTENTS) {
		fieinfo->fi_extents_mapped++;
		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
	}

	if (joined)
		goto copy;

	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
		return 1;

	/* Store new extent info in fi_extent_last for next call */
	extent->fe_logical = logical;
	extent->fe_physical = phys;
	extent->fe_length = len;
	extent->fe_flags = flags;
	extent->fe_lun = lun;

copy:
	if (copy_to_user(fieinfo->fi_extents_start, extent, sizeof(*extent)))
		return -EFAULT;

	fieinfo->fi_extents_start += sizeof(*extent);
	fieinfo->fi_extents_mapped++;

	/* Don't return 1 if we are using the last extent slot as we may
	 * be able to merge with the next extent, and we check above if
	 * the next extent won't fit into the array.
	 */

	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
}
EXPORT_SYMBOL(fiemap_fill_next_extent);

> +static int ioctl_fiemap(struct file *filp, unsigned long arg)
> +{
> +	struct fiemap fiemap;
> +	u64 len;
> +	u32 incompat_flags;
> +	struct fiemap_extent_info fieinfo = {0, };
> +	struct inode *inode = filp->f_path.dentry->d_inode;
> +	struct super_block *sb = inode->i_sb;
> +	int error = 0;
> +
> +	if (!inode->i_op->fiemap)
> +		return -EOPNOTSUPP;
> +
> +	if (copy_from_user(&fiemap, (struct fiemap __user *) arg,
> +			   sizeof(struct fiemap)))
> +		return -EFAULT;
> +
> +	/*
> +	 * The VFS does basic sanity checks on the flag
> +	 * value. Individual file systems can also reject otherwise
> +	 * 'valid' flags by returning -EBADR from ->fiemap.
> +	 */
> +	incompat_flags = fiemap.fm_flags & ~FIEMAP_FLAGS_COMPAT;
> +	if (incompat_flags)
> +		goto out_bad_flags;

Please remove this, and leave the checking for the individual filesystems.
They all need to do it anyways, because they can't depend on the check here
never changing or always being updated when this check does.  At best it is
redundant with the check in the filesystem, at worst it will lead to bugs
in the filesystem-specific code due to inconsistent checks.

> +#define FIEMAP_FLAGS_COMPAT	(FIEMAP_FLAG_NUM_EXTENTS|FIEMAP_FLAG_SYNC|   \
> +				 FIEMAP_FLAG_HSM_READ|FIEMAP_FLAG_XATTR|     \
> +				 FIEMAP_FLAG_LUN_ORDER)

This isn't needed anymore.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html