On Wed, 13 Aug 2014 07:21:20 +0000 Markus Stockhausen <stockhausen@xxxxxxxxxxx> wrote: > Hello you two, > > I saw Shaohua's patches for making the stripe size in raid4/5/6 configurable. > If I got it right Neil likes the idea but does not agree with the kind of the > implementation. > > The patch is quite big an intrusive so I guess that any other design will have > the same complexitiy. Neils idea about linking stripe headers sounds reasonable > but will make it neccessary to "look at the linked neighbours" for some operations. > Whatever "look" means programmatically. So I would like to hear your feedback > about the following desing. > > Will it make sense to work with per-stripe sizes? E.g. > > User reads/writes 4K -> Work on a 4K stripe. > User reads/writes 16K -> Work on a 16K stripe. > > Difficulties. > > - avoid overlapping of "small" and "big" stripes > - split stripe cache in different sizes > - Can we allocate multi-page memory to have continous work-areas? > - ... > > Benefits. > > - Stripe handling unchanged. > - paritiy calculation more efficient > - ... > > Other ideas? I fear that we are chasing the wrong problem. The scheduling of stripe handling is currently very poor. If you do a large sequential write which should map to multiple full-stripe writes, you still get a lot of reads. This is bad. The reason is that limited information is available to the raid5 driver concerning what is coming next and it often guesses wrongly. I suspect that it can be made a lot cleverer but I'm not entirely sure how. A first step would be to "watch" exactly what happens in terms of the way that requests come down, the timing of 'unplug' events, and the actual handling of stripes. 'blktrace' could provide most or all of the raw data. Then determine what the trace "should" look like and come up with a way for raid5 too figure that out and do it. I suspect that might involve are more "clever" queuing algorithm, possibly keeping all the stripe_heads sorted, possibly storing them in an RB-tree. Once you have that queuing in place so that the pattern of write requests submitted to the drives makes sense, then it is time to analyse CPU efficiency and find out where double-handling is happening, or when "batching" or re-ordering of operations can make a difference. If the queuing algorithm collects contiguous sequences of stripe_heads together, then processes a batch of them in succession make provide the same improvements as processing fewer larger stripe_heads. So: first step is to get the IO patterns optimal. Then look for ways to optimise for CPU time. NeilBrown
Attachment:
signature.asc
Description: PGP signature