On Thu, Mar 10, 2022 at 3:02 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 3/10/22 3:37 PM, Song Liu wrote: > > On Thu, Mar 10, 2022 at 2:15 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > >> > >> On 3/8/22 11:42 PM, Song Liu wrote: > >>> RAID arrays check/repair operations benefit a lot from merging requests. > >>> If we only check the previous entry for merge attempt, many merge will be > >>> missed. As a result, significant regression is observed for RAID check > >>> and repair. > >>> > >>> Fix this by checking more than just the previous entry when > >>> plug->multiple_queues == true. > >>> > >>> This improves the check/repair speed of a 20-HDD raid6 from 19 MB/s to > >>> 103 MB/s. > >> > >> Do the underlying disks not have an IO scheduler attached? Curious why > >> the merges aren't being done there, would be trivial when the list is > >> flushed out. Because if the perf difference is that big, then other > >> workloads would be suffering they are that sensitive to being within a > >> plug worth of IO. > > > > The disks have mq-deadline by default. I also tried kyber, the result > > is the same. Raid repair work sends IOs to all the HDDs in a > > round-robin manner. If we only check the previous request, there isn't > > much opportunity for merge. I guess other workloads may have different > > behavior? > > Round robin one at the time? I feel like there's something odd or > suboptimal with the raid rebuild, if it's that sensitive to plug > merging. It is not one request at a time, but more like (for raid456): read 4kB from HDD1, HDD2, HDD3..., then read another 4kB from HDD1, HDD2, HDD3, ... > Plug merging is mainly meant to reduce the overhead of merging, > complement what the scheduler would do. If there's a big drop in > performance just by not getting as efficient merging on the plug side, > that points to an issue with something else. We introduced blk_plug_max_rq_count() to give md more opportunities to merge at plug side, so I guess the behavior has been like this for a long time. I will take a look at the scheduler side and see whether we can just merge later, but I am not very optimistic about it. Thanks, Song