Re: [PATCH] mkfs.xfs: don't go into multidisk mode if there is only one stripe

Ric Wheeler <ricwheeler@xxxxxxxxx> · Thu, 29 Nov 2018 18:53:14 -0500

On 11/29/18 4:48 PM, Dave Chinner wrote:
On Thu, Nov 29, 2018 at 08:53:39AM -0500, Ric Wheeler wrote:
On 10/6/18 8:14 PM, Eric Sandeen wrote:
On 10/6/18 6:20 PM, Dave Chinner wrote:
Can you give an example of a use case that would be negatively affected
if this heuristic was switched from "sunit" to "sunit < swidth"?
Any time you only know a single alignment characteristic of the
underlying multi-disk storage. e.g. hardware RAID0/5/6 that sets
iomin = ioopt, multi-level RAID constructs where only the largest
alignment requirement is exposed, RAID1 devices exposing their chunk
size, remote replication chunk alignment (because remote rep. is
slow and so we need more concurrency to keep the pipeline full),
etc.
So the tl;dr here is "given any iomin > 512, we should infer low seek
latency and parallelism and adjust geometry accordingly?"

-Eric
Chiming in late here, but I do think that every decade or two (no
disrespect to xfs!), it is worth having a second look at how the
storage has changed under us.

The workload that has lots of file systems pounding on a shared
device for example is one way to lay out container storage.
The problem is that defaults can't cater for every use case.
And in this case, we've got nothing to tell us that this is
aggregated/shared storage rather than "the fileystem owns the
entire device".

No argument about documenting how to fix this with command line
tweaks for now, but maybe this would be a good topic for the next
LSF/MM shared track of file & storage people to debate?
Doubt it - this is really only an XFS problem at this point.

i.e. if we can't infer what the user wants from existing
information, then I don't see how the storage is going to be able to
tell us anything different, either.  i.e. somewhere in the stack the
user is going to have to tell the block device that this is
aggregated storage.

But even then, if it's aggregated solid state storage, we still want
to make use of the concurency on increased AG count because there is
no seek penalty like spinning drives end up with. Or if the
aggregated storage is thinly provisioned, the AG count of filesystem
just doesn't matter because the IO is going to be massively
randomised (i.e take random seek penalties) by the thinp layout.

So there's really no good way of "guessing" whether aggregated
storage should or shouldn't use elevated AG counts even if the
storage says "this is aggregated storage". The user still has to
give us some kind of explict hint about how the filesystem should
be configured.

What we need is for a solid, reliable detection hueristic to be
suggested by the people that need this functionality before there's
anything we can talk about.

Cheers,

Dave.

I think that is exactly the kind of discussion that the shared file/storage 
track is good for. Other file systems also need to accommodate/probe behind the 
fictitious visible storage device layer...  Specifically, is there something we 
can add  per block device to help here?  Number of independent devices or a map 
of those regions?

Regards,

Ric