Re: raid0 vs. mkfs

Avi Kivity <avi@xxxxxxxxxxxx> · Wed, 30 Nov 2016 00:45:10 +0200

On 11/29/2016 11:14 PM, NeilBrown wrote:
On Mon, Nov 28 2016, Avi Kivity wrote:
If it is easy for the upper layer to break a very large request into a
few very large requests, then I wouldn't necessarily object.
I can't see why it would be hard.  It's simple arithmetic.
That is easy to say before writing the code :-)
It probably is easy for RAID0.  Less so for RAID10.  Even less for
RAID6.

pick the largest subrange wihin the inpu range whose bounds are 0 (mod 
stripe-size); TRIM it (for all members); apply the regular algorithm to 
the head and tail subranges.  Works for all RAID types.  If the RAID is 
undergoing reshaping, exclude the range undergoing reshaping, and treat 
the two halves separately.

But unless it is very hard for the lower layer to merge requests, it
should be doing that too.
Merging has tradeoffs.  When you merge requests R1, R2, ... Rn you make
the latency request R1 sum of the latencies of R1..Rn.  You may gain
some efficiency in the process, but that's not going to make up for a
factor of n.  The queuing layer has no way to tell whether the caller is
interested in the latency of individual requests.  By sending large
requests, the caller indicates it's not interested in the latency of
individual subranges.  The queuing layer is still free to internally
split the request to smaller ranges, to satisfy hardware constraints, or
to reduce worst-case latencies for competing request streams.
I would have thought that using plug/unplug to group requests is a
fairly strong statement that they can be handled as a unit if that is
convenient.

It is not.  As an example, consider a read and a few associated 
read-ahead requests submitted in a batch.  The last thing you want is 
for them to be treated as a unit.

Plug/unplug means: I have a bunch of requests here.  Whether they should 
be merged or reordered is orthogonal to whether they are members of a 
batch or not.

So I disagree that all the work should be pushed to the merging layer.
It has less information to work with, so the fewer decisions it has to
make, the better.
I think that the merging layer should be as efficient as it reasonably
can be, and particularly should take into account plugging.  This
benefits all callers.

Yes, but plugging does not mean "please merge anything you can until the 
unplug".

If it can be demonstrated that changes to some of the upper layers bring
further improvements with acceptable costs, then certainly it is good to
have those too.

Generating millions of requests only to merge them again is 
inefficient.  It happens in an edge case (TRIM of the entirety of a very 
large RAID), but it already caused on user to believe the system 
failed.  I think the system should be more robust than that.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html