Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 28 May 2008, Andreas Dilger wrote:
> On May 27, 2008  13:19 -0400, Chris Mason wrote:
> > On Tuesday 27 May 2008, jim owens wrote:
> > > For what it is worth, a few comments from a newbie who has
> > > experience with a non-linux filesystem that has a similar API
> > > and supports files spread across multiple devices.
> > >
> > > Mark Fasheh wrote:
> > > > * FIEMAP_FLAG_LUN_ORDER
> > > > If the file system stripes file data, this will return contiguous
> > > > regions of physical allocation, sorted by LUN. Logical offsets may
> > > > not make sense if this flag is passed. If the file system does not
> > > > support multiple LUNs, this flag will be ignored.
> > >
> > > This should return an error (ENOTSUPPORTED ?) if the FS does
> > > not support multiple devices OR does not support sort-by-lun-order
> > > so the caller does not count on the info being sorted.  Even an FS
> > > that supports multiple devices per file may be unable to sort it
> > > by on-disk-order without consuming an ugly set of resources.
> >
> > That's a good point, I couldn't provide 100% sorted output even if I
> > wanted to.
>
> I'm OK with this also.  The only reason I thought "simple" filesystems
> (i.e. single-lun) should ignore FLAG_LUN_ORDER is so that tools like
> filefrag can always try with LUN_ORDER and in most cases still get a
> mapping returned.  If the filesystem doesn't care about LUN_MAPPING, no
> harm done, because all of the extents live on a single LUN anyways.  If
> a multi-device filesystem doesn't want to implement LUN_ORDER, returning
> -EBADR is perfectly acceptable because the application will retry without
> the unsupported flags (LUN_ORDER in this case) and get the logical file
> offset order data returned.
>
> For Lustre, it is completely inefficient to return data in non-LUN_ORDER,
> because it is doing RAID-0 striping of the file data across data servers.
> A 100MB 2-stripe file with 1MB stripes would have to return 100 extents,
> even if the file data is allocated contiguously on disk in the backing
> filesystems in two 50MB chunks.  With LUN_ORDER it will return 2 extents
> and the user can see much more clearly that the file is layed out well.

Ah, so lustre doesn't have a logical address layer at all?  In my case the 
files contain pointers to contiguous logical extent and the lower layers of 
the FS figure out that is raid0/1/10 or whatever future crud I toss in.

If the logical extents are contiguous it is safe to assume the lower end is 
also contiguous.

[ huge snip ;) ]

> My point of view is that FIEMAP is a file layout visualization API that
> could also be used in certain cases for direct data access.  Since any
> direct access of data returned by FIEMAP is inherently racy (as is
> FIBMAP), I'm less concerned with the mappings being fully consistent,
> and more concerned with providing the maximum amount of information.
>
> Any application using FIEMAP for direct data access (e.g. dump of
> some kind) either has to guard against races itself by verifying the
> mapping again afterward, or for uses like lilo trust that the admin
> is doing the right thing.  That isn't a new issue with FIEMAP vs FIBMAP.

So, I'm a big fan of better layout visualization and creating APIs to improve 
it.  At some point we need to take a step back and ask if those apis are 
better left to other tools instead of heaping them all into fiemap.  

The advantage of dropping the lun support from fiemap and pushing it into a 
new ioctl/syscall is that we can determine the underlying storage topology 
for any logical block on the device, including those underneath md/dm without 
worrying about a backing file.

And then we can get interesting information about stripe widths, preferred IO 
sizes etc etc.

[ lots of other stuff that makes good sense snipped too ]

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux