On Fri, Nov 30, 2018 at 01:00:52PM -0500, Ric Wheeler wrote: > On 11/30/18 7:55 AM, Dave Chinner wrote: > >On Thu, Nov 29, 2018 at 06:53:14PM -0500, Ric Wheeler wrote: > >>Other file systems also need to > >>accommodate/probe behind the fictitious visible storage device > >>layer... Specifically, is there something we can add per block > >>device to help here? Number of independent devices > >That's how mkfs.xfs used to do stripe unit/stripe width calculations > >automatically on MD devices back in the 2000s. We got rid of that > >for more generaly applicable configuration information such as > >minimum/optimal IO sizes so we could expose equivalent alignment > >information from lots of different types of storage device.... > > > >>or a map of > >>those regions? > >Not sure what this means or how we'd use it. > >Dave. > > What I was thinking of was a way of giving up a good outline of how > many independent regions that are behind one "virtual" block device > like a ceph rbd or device mapper device. My assumption is that we > are trying to lay down (at least one) allocation group per region. > > What we need to optimize for includes: > > * how many independent regions are there? > > * what are the boundaries of those regions? > > * optimal IO size/alignment/etc > > Some of that we have, but the current assumptions don't work well > for all device types. Oh, so essential "independent regions" of the storage device. I wrote this in 2008: http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption#Failure_Domains This was derived from the ideas in prototype code I wrote in ~2007 to try to optimise file layout and load distribution across linear concats of multi-TB RAID6 luns. Some of that work was published long after I left SGI: https://marc.info/?l=linux-xfs&m=123441191222714&w=2 Essentially, independent regions - called "Logical Extension Groups", or "legs" of the filesystem - and would essentially be an aggregation of AGs in that region. The concept was that we'd move the geometry information from the superblock into the legs, and so we could have different AG geoemetry optimies for each independent leg of the filesystem. eg the SSD region could have numerous small AGs, the large, contiguous RAID6 part could have maximally size AGs or even make use of the RT allocator for free space management instead of the AG/btree allocator. Basically it was seen as a mechanism for getting rid of needing to specify block devices as command line or mount options. Fundamentally, though, it was based on the concept that Linux would eventually grow an interface for the block device/volume manager to tell the filesystem where the independent regions in the device were(*), but that's not something that has ever appeared. If you can provide an indepedent region map in an easy to digest format (e.g. a set of {offset, len, geometry} tuples), then we can obviously make use of it in XFS.... Cheers, Dave. (*) Basically provide a linux version of the functionality Irix volume managers had provided filesystems since the late 80s.... -- Dave Chinner david@xxxxxxxxxxxxx