Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 27 May 2008, jim owens wrote:
> For what it is worth, a few comments from a newbie who has
> experience with a non-linux filesystem that has a similar API
> and supports files spread across multiple devices.
>
> Mark Fasheh wrote:
> > * FIEMAP_FLAG_LUN_ORDER
> > If the file system stripes file data, this will return contiguous
> > regions of physical allocation, sorted by LUN. Logical offsets may not
> > make sense if this flag is passed. If the file system does not support
> > multiple LUNs, this flag will be ignored.
>
> This should return an error (ENOTSUPPORTED ?) if the FS does
> not support multiple devices OR does not support sort-by-lun-order
> so the caller does not count on the info being sorted.  Even an FS
> that supports multiple devices per file may be unable to sort it
> by on-disk-order without consuming an ugly set of resources.

That's a good point, I couldn't provide 100% sorted output even if I wanted 
to.

>
> Christoph Hellwig wrote:
> >>	__u32	fe_lun;	   /* logical device number for extent (starting at 0)*/
> >
> > Again this lun thing is horribly ill-defined.  There is no such thing
> > as a logic device number in our filesystem terminology.
>
> I agree that LUN is confusing.  In my opinion the words "logical"
> and "number" are overused and meaningless.  As Brad suggested,
> "device" would be preferable, or "unit", but unfortunately every
> word I can think of has some other definition too :)
>
> Our term was "volume"... an awful designation.
>
> Chris Mason wrote:
> > For btrfs I would return the logical extents via fiemap (just like the
> > file were on lvm) and make btrfs specific ioctls for details about where
> > the file actually lived.
> >
> > fiemap alone isn't a great way to describe raid levels or complex storage
> > topologies.  To include physical information I would also have to encode
> > the raid level used and information about all the devices the data is
> > replicated on (raid1/10)
>
> fiemap by itself is useful for programs that want to determine
> how fragmented a file is or where sparse areas are to skip.

Yes, and since it has no concurrency semantics, use outside of that quickly 
gets difficult.  fibmap is used by lilo, and reiserfs needs a special ioctl 
that said i've-called-fibmap-please-don't-move-these-bytes that prevented 
tail packing.

>
> At least one more generic API is needed to enumerate the device number
> to device (path name, inode, socket, ... ?).  In our case this was only
> used for clusters.
>
> For the complex case you describe, it might be possible to have
> an "enumerate" api that could be used to traverse each layer for
> more detail.  I hope this is done generically by someone.
>

It would be especially interesting if the enumerate API actually went all the 
way down to the lvm/md layers as well.

> A final thought on this:
> > 	__u32	fe_lun;	   /* logical device number for extent (starting at 0)*/
>
> While the flags field can be used to tell the validity of this
> number, we found that starting at 0 was not a good practice.
> We started at 1 so 0 was always a not-valid.  One way this can
> be useful is if you have delayed allocation, you can indicate
> "intended device" with a non-0 number.  Of course other values
> such as max_int could be termed "invalid" instead.

I use 0 as not-valid as well.  The original intent was 0 meant 
logical-block-number, signaling additional lookups were needed.  But I 
haven't found a good use case for that yet.

>
> Another point to document is whether this number is a contiguous
> series (1, 2, 3,... N) defining the location based on the current
> device list or is possibly a sparse (1, 2, 6) series because the
> FS tracks devices that have been removed.  In our implementation
> both views were present for different consumers.  The sparse
> series was native and the contiguous series a translation.

Interesting, I've been presenting the sparse representation only.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux