Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 29, 2008  09:01 -0400, Christoph Hellwig wrote:
> On Wed, May 28, 2008 at 10:09:31AM -0600, Andreas Dilger wrote:
> > ... but I don't think it should necessarily be _required_ to return a
> > real "dev_t" (major, minor) device.  For network filesystems this is
> > meaningless.  If it is possible for FIEMAP_EXTENT_NET to signal that the
> > device is not a local/physical device (where a dev_t has no meaning),
> > and simply allow an enumeration [0, 1, 2, ...] of the logical devices
> > then I think this is reasonable.  The mapping of logical devices to
> > servers is available separately with a Lustre-specific ioctl.
> > 
> > This passes more information for filesystems that have local devices
> > while not breaking the functionality for network filesystems and could
> > be used as an efficient replacement for lilo's use of FIBMAP.
> 
> A dev_t actually means something for the only in-tree users of
> this interface, so there's no point making this interface worse for
> some long-term out of tree code.  And it's not like you simply can't
> allow multiple anonymous blockdevices for your networked filesystems
> similar to the one used for st_dev already.

But requiring 1500 anonymous blockdevices (== number of storage targets)
be created at mount time, which exporting some varying-over-reboot, and
inconsistent-across-clients random-value dev_t for network filesystems
just for the possibility that the client is going to do FIEMAP isn't
making the interface better either...

Getting devices of [0x1908afed, 0x4058204b] back from FIEMAP of a file
on one client, and [0x4bac5821, 0x0abefd63] on another client is pretty
useless compared to devices [2, 4], which have very clear meanings,
will always be the same across all clients, and the same across reboots.

> > For RAID1/10 you can return multiple logical->physical extent mappings
> > for the same logical range of the file with different "device" IDs.  You
> > could do the same for RAID5 returning each of the data and parity chunks
> > with "NO_DIRECT" if desired (maybe only on the parity extent, or don't
> > return the parity extent at all).  The spec does not require that the
> > returned extents be non-overlapping.
> 
> Umm, no.  That's just make the interface too complicated.  I can bet
> with your that userspace programmers will generally only test their code
> with simple filesystems and hell will break lose when they get these
> multiple ranges.  Especially as that's a very unnatural interface.

The metadata information isn't exposed to callers by default, they have
to request it explicitly with e.g. FIEMAP_FLAG_METADATA.  For the most
common use cases, applications/users will care about:
a) for cp/tar/dd/etc they only want to know where there are holes.  This
   is available in the most simple instance of FIEMAP (no flags).
b) for "fiemap" the user will want to know whether there are large or
   small contiguous allocations/fragmentation, or just the extent count.
c) for sophisticated users (e.g. filesystem developers, performance tuning)
   they want to know both the extent information, the metadata layout, and
   possibly the mapping all the way down to the platters

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux