On Tue, 9 Sep 2014 11:33:13 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > > > On Tue, Sep 9, 2014 at 4:45 AM, NeilBrown <neilb@xxxxxxx> wrote: > > On Mon, 8 Sep 2014 16:55:52 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > > wrote: > > > >> Hi Neil, > >> We have been seeing high latency on the md/raid1 block device, due to > >> the fact that all WRITEs are handed off to raid1d thread. This thread > >> also calls bitmap_unplug(), which writes the bitmap synchronously. > >> While it waits for the bitmap, it cannot trigger other WRITEs waiting > >> in its pending_bio_list. This is especially seen with SSDs: MD's > >> latency is much higher that SSD latency (I have been stoned by Peter > >> Grandi when I brought up this issue previously for raid5). > >> > >> Then I have noticed the commit: > >> > >> commit f54a9d0e59c4bea3db733921ca9147612a6f292c > >> Author: NeilBrown <neilb@xxxxxxx> > >> Date: Thu Aug 2 08:33:20 2012 +1000 > >> > >> md/raid1: submit IO from originating thread instead of md thread. > >> > >> Looking at the code, I learned that to avoid switching into raid1d, > >> the caller has to use blk_start_plug/blk_finish_plug. So I added these > >> calls in our kernel module, which submits bios to MD. Results were > >> awesome, MD latency got down significantly. > > > > That's good to hear. > > > >> > >> So I have several questions about this plug/unplug thing. > >> > >> 1/ Originally this infrastructure was supposed to help IO schedulers > >> in merging requests. It is useful when one has a bunch of requests to > >> submit in one shot. > > > > That is exactly the whole point of plugging: allow the device to handle a > > batch of requests together instead of one at a time. > > > >> But in MD case, thus infrastructure is used for a different purpose: > >> not to merge requests (which may help bandwidth, but probably not > >> latency), but to avoid making raid1d a bottleneck, to be able to > >> submit requests from multiple threads in parallel, which brings down > >> latency significantly in our case. Indeed "struct blk_plug" has a > >> special "cb_list", which is used only by MD. > > > > I don't think the way md uses plugging is conceptually different from any > > other use: it is always about gathering a batch together. > > "cb_list" is handled by blk_check_plugged() which is also used by > > block/umem.c and btrfs. > > > > The base plugging code assumes that it is only gathering a batch of requests > > for a single device - if the target device changes then the batch is flushed. > > It also assumed that it was "struct request" that was batched. > > Devices like md that want to queue 'struct bio', something else was needed. > > Also with layered devices it can be useful to gather multiple batches for > > multiple layers. > > So I created "cb_list" etc and a more generic interface. > > > >> In my case I have only individual bios (not a bunch of bios), and I > >> after wrap them with plug/unplug, MD latency gets better. So we are > >> using the plug infrastructure for a different purpose. > >> Is my understanding correct? Was this your intention? > > > > I don't really understand what you are doing. There is no point in using > > plugging for individual bios. The main point for raid1 writes is to gather > > a lot of writes together so that all multiple bitmap bits can be set all at > > once. > > It should be possible to submit individual bios directly from make_request > > without passing them to raid1d and without using plugging. > Can you pls explain how it is possible? > You have this code for WRITEs: > cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); > if (cb) > plug = container_of(cb, struct raid1_plug_cb, cb); > else > plug = NULL; > spin_lock_irqsave(&conf->device_lock, flags); > if (plug) { > bio_list_add(&plug->pending, mbio); > plug->pending_cnt++; > } else { > bio_list_add(&conf->pending_bio_list, mbio); > conf->pending_count++; > } > spin_unlock_irqrestore(&conf->device_lock, flags); > > If the thread blk_check_plugged returns NULL, then you always hand the > WRITE to raid1d. So the only option to avoid handoff to raid1d is for > the caller to plug. Otherwise, all WRITEs are handed off to raid1d and > latency becomes terrible. > So in my case, I use plug/unplug for individual bios only to avoid the > handoff to raid1d. > What am I missing in this analysis? if blk_check_plugged succeeds then it has arranged for raid1_unplug to be called a little later by that same process. So there is nothing to stop you calling raid1_unplug immediately. raid1_unplug essentially does: bitmap_unplug() generic_make_request() so you can very nearly just do that, without any plugging. There is a bit of extra subtlety but I can't really know how relevant that might be to you without actually seeing you code. NeilBrown
Attachment:
signature.asc
Description: PGP signature