On Fri, Mar 03 2017, Shaohua Li wrote: > On Fri, Mar 03, 2017 at 02:43:49PM +1100, Neil Brown wrote: >> On Fri, Feb 17 2017, Shaohua Li wrote: >> >> > Previous patch (raid5: only dispatch IO from raid5d for harddisk raid) >> > defers IO dispatching. The goal is to create better IO pattern. At that >> > time, we don't sort the deffered IO and hope the block layer can do IO >> > merge and sort. Now the raid5-cache writeback could create large amount >> > of bios. And if we enable muti-thread for stripe handling, we can't >> > control when to dispatch IO to raid disks. In a lot of time, we are >> > dispatching IO which block layer can't do merge effectively. >> > >> > This patch moves further for the IO dispatching defer. We accumulate >> > bios, but we don't dispatch all the bios after a threshold is met. This >> > 'dispatch partial portion of bios' stragety allows bios coming in a >> > large time window are sent to disks together. At the dispatching time, >> > there is large chance the block layer can merge the bios. To make this >> > more effective, we dispatch IO in ascending order. This increases >> > request merge chance and reduces disk seek. >> >> I can see the benefit of batching and sorting requests. >> >> I wonder if the extra complexity of grouping together 512 requests, then >> submitting the "first" 128 is really worth it. Have you measured the >> value of that? > > I'm pretty sure I tried. The whole point of dispatching the first 128 is we > don't have a better pipeline. Grouping 512 and then dispatching them together > definitely improve the IO patter, but the request accumulation takes time, we > will have no IO running in the window. But we don't wait for the first batch before we start collecting the next batch - do we? Why would there be a window with no IO running? > >> If you just submitted every time you got 512 requests, you could use >> list_sort() on the bio list and wouldn't need an array. >> >> If an array really is best, it would be really nice if "sort" could pass >> a 'void*' down to the cmp function, >> and it could sort all bios that are >> *after* last_bio_pos first, and then the others. That would make the >> code much simpler. I guess sort() could be changed (list_sort() already >> has a 'priv' argument like this). > > Ok, I'll change this to a list. And add extra pointer to record the last sorted > entry. I didn't see the sort uses much time in my profile, but the merge sort > looks better. Will do the change. I think both sorts are O(log(N)). I had thought that list_sort() would work on a bio_list, but it requires a list_head (even though it doesn't use the prev pointer). If it worked on a bio_list and if you could just submit the whole batch, then using list_sort would have meant that you don't need to allocate a table of r5pending_data. Now with the struct list_head in there, the data is twice the size. I guess that doesn't matter too much. It just feels like there should be a cleaner solution, but I cannot find it without writing a new sort function (not that it would be so hard do to that). Thanks, NeilBrown > >> If we cannot change sort(), then maybe use lib/bsearch.c for the binary >> search. Performing two comparisons in the loop of a binary search >> should get a *fail* in any algorithms class!! >> The "pending_data" array that you have added to the r5conf structure >> adds 4096 bytes. This means it is larger than a page, which is best >> avoided (though it is unlikely to cause problems). I would allocate it >> separately. > > Yep, already fixed internally. > >> >> So there is a lot that I don't really like, but it seems like a good >> idea in principle. > > ok, thanks for your time! > > Thanks, > Shaohua
Attachment:
signature.asc
Description: PGP signature