On Tue, Nov 29, 2016 at 5:45 PM, Avi Kivity <avi@xxxxxxxxxxxx> wrote: > On 11/29/2016 11:14 PM, NeilBrown wrote: >> >> On Mon, Nov 28 2016, Avi Kivity wrote: >>>> >>>> If it is easy for the upper layer to break a very large request into a >>>> few very large requests, then I wouldn't necessarily object. >>> >>> I can't see why it would be hard. It's simple arithmetic. >> >> That is easy to say before writing the code :-) >> It probably is easy for RAID0. Less so for RAID10. Even less for >> RAID6. >> > > pick the largest subrange wihin the inpu range whose bounds are 0 (mod > stripe-size); TRIM it (for all members); apply the regular algorithm to the > head and tail subranges. Works for all RAID types. If the RAID is > undergoing reshaping, exclude the range undergoing reshaping, and treat the > two halves separately. > >>>> But unless it is very hard for the lower layer to merge requests, it >>>> should be doing that too. >>> >>> Merging has tradeoffs. When you merge requests R1, R2, ... Rn you make >>> the latency request R1 sum of the latencies of R1..Rn. You may gain >>> some efficiency in the process, but that's not going to make up for a >>> factor of n. The queuing layer has no way to tell whether the caller is >>> interested in the latency of individual requests. By sending large >>> requests, the caller indicates it's not interested in the latency of >>> individual subranges. The queuing layer is still free to internally >>> split the request to smaller ranges, to satisfy hardware constraints, or >>> to reduce worst-case latencies for competing request streams. >> >> I would have thought that using plug/unplug to group requests is a >> fairly strong statement that they can be handled as a unit if that is >> convenient. > > > It is not. As an example, consider a read and a few associated read-ahead > requests submitted in a batch. The last thing you want is for them to be > treated as a unit. > > Plug/unplug means: I have a bunch of requests here. Whether they should be > merged or reordered is orthogonal to whether they are members of a batch or > not. > >> >> >>> So I disagree that all the work should be pushed to the merging layer. >>> It has less information to work with, so the fewer decisions it has to >>> make, the better. >> >> I think that the merging layer should be as efficient as it reasonably >> can be, and particularly should take into account plugging. This >> benefits all callers. > > > Yes, but plugging does not mean "please merge anything you can until the > unplug". > >> If it can be demonstrated that changes to some of the upper layers bring >> further improvements with acceptable costs, then certainly it is good to >> have those too. > > > Generating millions of requests only to merge them again is inefficient. It > happens in an edge case (TRIM of the entirety of a very large RAID), but it > already caused on user to believe the system failed. I think the system > should be more robust than that. DM's stripe target (raid0) has had Avi's proposed implementation (breaks large discard into N discards, one per component device) for a long while. See commit 7b76ec11fec4 ("dm stripe: support discards") This improvement applies to other raid levels too. And I completely agree that it would be a mistake to to continue to have the block layer issue millions of chunk-sized bios only to _maybe_ have them be merged by the IO scheduler and/or plugging. Needless make-work that just eats cpu and memory. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html