Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 25, 2008  15:42 -0400, Christoph Hellwig wrote:
> On Sat, May 24, 2008 at 05:01:48PM -0700, Mark Fasheh wrote:
> > * FIEMAP_FLAG_HSM_READ
> > If the extent is offline, retrieve it before mapping and do not flag
> > it as FIEMAP_EXTENT_SECONDARY. This flag has no effect if the file
> > system does not support HSM.
> 
> Given that there's no HSM support in mainline this should not be added.
> It'll be useful once we add proper HSM support, though :)

This was added at the request of David for XFS, because the XFS bmap
ioctl defaults to reading in extents from HSM.  I don't have any
attachment to it myself.

> > * FIEMAP_FLAG_LUN_ORDER
> > If the file system stripes file data, this will return contiguous
> > regions of physical allocation, sorted by LUN. Logical offsets may not
> > make sense if this flag is passed. If the file system does not support
> > multiple LUNs, this flag will be ignored.
> 
> A LUN doesn't make any sense in filesystem context.  That's a
> scsi-centric acronym that doesn't even make sense in a scsi-centric
> filesystem universe because a LUN can of course contain multiple
> partitions.  It's also extremly ill-defined when using volume managers.

What else do you propose calling this?  It isn't a LUN in the SCSI sense
of course, but there is definitely a need to be able to identify multiple
disks.  Regardless of whether there is a single disk or multiple disks
involved, it is generally called a LUN.  It is a better than calling it
a "disk" or a "partition".

> There's also no filesystems that actually support a single file on
> multiple device in mainline, the only filesystem that supports multiple
> data devices at all (XFS) requires each file to be on a single device.
> 
> Once we have a filesystem with real multiple data device support like
> btrfs or a future XFS version we can worry about this and defined
> a different ioctl for it.

I don't see why we need a different ioctl for mapping extents on a
filesystem that support direct access to multiple disks.  Having one
mechanism that returns the file mapping is much more simple for user
space applications (filefrag, cp, tar, gzip, etc) than having to use
different ioctls for different backing filesystems.

> > Each extent is described by a single fiemap_extent structure as
> > returned in fm_extents.
> > 
> > struct fiemap_extent {
> > 	__u64	fe_logical;/* logical offset in bytes for the start of
> > 			    * the extent */
> > 	__u64	fe_physical; /* physical offset in bytes for the start
> > 			      * of the extent */
> > 	__u64	fe_length; /* length in bytes for the extent */
> > 	__u32	fe_flags;  /* returned FIEMAP_EXTENT_* flags for the extent */
> > 	__u32	fe_lun;	   /* logical device number for extent (starting at 0)*/
> 
> Again this lun thing is horribly ill-defined.  There is no such thing
> as a logic device number in our filesystem terminology.

Propose a better name then, but the need for it will not go away.  This
is needed for Lustre, btrfs, pNFS, etc.  The whole point of developing
this API and getting input from all of the main filesystems was to have
a single common interface that could be used by all filesystems.

> > struct fiemap_extent_info {
> > 	unsigned int	fi_flags;		/* Flags as passed from user */
> > 	unsigned int	fi_extents_mapped;	/* Number of mapped extents */
> > 	unsigned int	fi_extents_max;		/* Size of fiemap_extent array */
> > 	char		*fi_extents_start;	/* Start of fiemap_extent array */
> > };
> 
> Why is this passes a structure instead of individual arguments?

Saves on passing this around as arguments on the stack?  Also, for ext4
there is an iterator function which needs a private data struct passed,
and it doesn't make sense to require duplicating all of this information
again.

> Also why isn't fi_extents_start properly typed?

I was wondering about that, I'm not sure why Mark implemented it that
way.  I would have thought that it should be a struct fiemap_extent *.
I thought maybe to allow for misaligned userspace pointers, but I'm
not sure.

> > If the request has the FIEMAP_FLAG_NUM_EXTENTS flag set, then calling
> > this helper is not necessary and fi_extents_mapped can be set
> > directly.
> 
> Sounds like the count number of extents request should be a separate
> ioctl and separate filesystem entry point instead of overloading FIEMAP.

I don't see that at all.  The operations that the filesystem has to do
are basically the same whether it is counting extents or returning them.
All that would result from having separate ioctl and filesystem methods
would be a lot of code duplication.

The fiemap_fill_next_extents() call will handle the NUM_EXTENTS operation
internally, and the filesystem code doesn't need to special case this
at all.  The only time the NUM_EXTENTS case would be handled by the
filesystem specially would be if it tracks the count of extents itself
for some reason.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux