Re: block layer API for file system creation - when to use multidisk mode

Ric Wheeler <ricwheeler@xxxxxxxxx> · Sat, 1 Dec 2018 15:52:31 -0500

On 11/30/18 11:35 PM, Dave Chinner wrote:
On Fri, Nov 30, 2018 at 01:00:52PM -0500, Ric Wheeler wrote:
On 11/30/18 7:55 AM, Dave Chinner wrote:
On Thu, Nov 29, 2018 at 06:53:14PM -0500, Ric Wheeler wrote:
Other file systems also need to
accommodate/probe behind the fictitious visible storage device
layer... Specifically, is there something we can add per block
device to help here? Number of independent devices
That's how mkfs.xfs used to do stripe unit/stripe width calculations
automatically on MD devices back in the 2000s. We got rid of that
for more generaly applicable configuration information such as
minimum/optimal IO sizes so we could expose equivalent alignment
information from lots of different types of storage device....

or a map of
those regions?
Not sure what this means or how we'd use it.
Dave.
What I was thinking of was a way of giving up a good outline of how
many independent regions that are behind one "virtual" block device
like a ceph rbd or device mapper device. My assumption is that we
are trying to lay down (at least one) allocation group per region.

What we need to optimize for includes:

     * how many independent regions are there?

     * what are the boundaries of those regions?

     * optimal IO size/alignment/etc

Some of that we have, but the current assumptions don't work well
for all device types.
Oh, so essential "independent regions" of the storage device. I
wrote this in 2008:

http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption#Failure_Domains

This was derived from the ideas in prototype code I wrote in ~2007
to try to optimise file layout and load distribution across linear
concats of multi-TB RAID6 luns. Some of that work was published
long after I left SGI:

https://marc.info/?l=linux-xfs&m=123441191222714&w=2

Essentially, independent regions - called "Logical
Extension Groups", or "legs" of the filesystem - and would
essentially be an aggregation of AGs in that region. The
concept was that we'd move the geometry information from the
superblock into the legs, and so we could have different AG
geoemetry optimies for each independent leg of the filesystem.

eg the SSD region could have numerous small AGs, the large,
contiguous RAID6 part could have maximally size AGs or even make use
of the RT allocator for free space management instead of the
AG/btree allocator. Basically it was seen as a mechanism for getting
rid of needing to specify block devices as command line or mount
options.

Fundamentally, though, it was based on the concept that Linux would
eventually grow an interface for the block device/volume manager to
tell the filesystem where the independent regions in the device
were(*), but that's not something that has ever appeared. If you can
provide an indepedent region map in an easy to digest format (e.g. a
set of {offset, len, geometry} tuples), then we can obviously make
use of it in XFS....

Cheers,

Dave.

(*) Basically provide a linux version of the functionality Irix
volume managers had provided filesystems since the late 80s....

Hi Dave,

This is exactly the kind of thing I think would be useful.  We might want to 
have a distinct value (like the rotational) that indicates this is a device with 
multiple "legs" so that normally we query that and don't have to look for the 
more complicated information.

Regards,

Ric