> Von: NeilBrown [neilb@xxxxxxx]
> Gesendet: Donnerstag, 14. August 2014 06:11
> An: Markus Stockhausen
> Cc: shli@xxxxxxxxxx; linux-raid@xxxxxxxxxxxxxxx
> Betreff: Re: Bigger stripe size
> ...
> >
> > Will it make sense to work with per-stripe sizes? E.g.
> >
> > User reads/writes 4K -> Work on a 4K stripe.
> > User reads/writes 16K -> Work on a 16K stripe.
> >
> > Difficulties.
> >
> > - avoid overlapping of "small" and "big" stripes
> > - split stripe cache in different sizes
> > - Can we allocate multi-page memory to have continous work-areas?
> > - ...
> >
> > Benefits.
> >
> > - Stripe handling unchanged.
> > - paritiy calculation more efficient
> > - ...
> >
> > Other ideas?
>
> I fear that we are chasing the wrong problem.
>
> The scheduling of stripe handling is currently very poor. If you do a large
> sequential write which should map to multiple full-stripe writes, you still
> get a lot of reads. This is bad.
> The reason is that limited information is available to the raid5 driver
> concerning what is coming next and it often guesses wrongly.
>
> I suspect that it can be made a lot cleverer but I'm not entirely sure how.
> A first step would be to "watch" exactly what happens in terms of the way
> that requests come down, the timing of 'unplug' events, and the actual
> handling of stripes. 'blktrace' could provide most or all of the raw data.
>
Thanks for that info. I did not expect to find so basic challenges in the code ...
Could you explain what you mean with unplug events? Maybe you can give me
the function in raid5.c that would be the right place to understand better how
changed data "leaves" the stripes and puts it on freelists again.
>
> Then determine what the trace "should" look like and come up with a way for
> raid5 too figure that out and do it.
> I suspect that might involve are more "clever" queuing algorithm, possibly
> keeping all the stripe_heads sorted, possibly storing them in an RB-tree.
>
> Once you have that queuing in place so that the pattern of write requests
> submitted to the drives makes sense, then it is time to analyse CPU efficiency
> and find out where double-handling is happening, or when "batching" or
> re-ordering of operations can make a difference.
> If the queuing algorithm collects contiguous sequences of stripe_heads
> together, then processes a batch of them in succession make provide the same
> improvements as processing fewer larger stripe_heads.
>
> So: first step is to get the IO patterns optimal. Then look for ways to
> optimise for CPU time.
>
> NeilBrown
Markus
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
�ber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************