AW: [PATCH v2 0/6] raid6: support read-modify-write

Markus Stockhausen <stockhausen@xxxxxxxxxxx> · Thu, 21 Aug 2014 07:08:03 +0000

> Von: NeilBrown [neilb@xxxxxxx]
> Gesendet: Donnerstag, 21. August 2014 06:58
> An: Markus Stockhausen
> Cc: linux-raid@xxxxxxxxxxxxxxx
> Betreff: Re: [PATCH v2 0/6] raid6: support read-modify-write
> 
> 
> Thanks.  This looks a lot nicer.  If you resend them with the formatting a
> s-o-b changes I mentioned I'll put them in my try and try to do some testing
> and look at bits more closely.
> Two things:
> 1/ I would be good to have performance numbers in the patch description for
>    the patch that makes it all work.  Put yours there now, we can add other
>    later..

will send patches the next days

> 2/ When you do a partial syndrome you specify a start and end.
>    Is that really what is always wanted, or what is easiest?
>    I imaging that you might want to "add" or "subtract" an arbitrary subset
>    of blocks.  I imaging that the "blocks" array of pointers that is
>    passing could have NULLs for the blocks to ignore and would include
>    all others in the computation.
>    Is there a good reason for not doing that.

If you have a closer look the upper layers of the patch will do the NULL page
handling - see set_syndrome_sources(). This is broken down into a design of
"start+stop+kernel zero page" algorithm in the syndrome functions. I opted 
for that way because of the following reasons:

- The original algorithms are based on "per-line" syndrome calculation. So they 
will fully calculate x bytes of the syndrome while loading x bytes from alle the 
source pages. If we would do a full calculation for the D0 page, then the D1 
page and so on we need to load/store the partially calculated P/Q values 
multiple times. Additionally the calculation of GF(X) would be quite hard.

- The stop page marker is the essential winner in our calculations because
everything right of that can be ignored.

- NULL pages between data pages are hard to handle. This would lead to
more complexity & additional branches in the assembler routines. The SSE2
implementation needs 34*<number of disks> instructions to calculate 64
bytes of the syndrome. If a disk is zero one of these cycles can be reduced 
to 20 instructions. In my consideration chances will be high that only
adjacent pages will be written most of the time. So stay close to the
original design and keep things clean.

- The start page marker is the clear indication for the functions that from
now on they only need the GF(X) multiplication. So once again do not
switch over to table lookups but stay with the old design. I know that
this is overhead if you change D13 in a 16 disk raid6. But even with this
we have only a quite small CPU overhead (factor 2 for calling the
xor_syndrome twice per rmw):

- RCW: (14*34 instructions)/64 bytes + 13 read I/Os + 3 write I/Os
- RMW: (2*34+2*13*20 instructions)/64 bytes + 3 read I/Os + 3 write I/Os

On the other hand changing D0 is a not-to-discuss clear win for the new
implementation:

- RCW: (14*34 instructions)/64 bytes + 13 read I/Os + 3 write I/Os
- RMW: (2*34)/64 bytes + 3 read I/Os + 3 write I/Os

Conclusion of it all: I had the choice between simply copying the
functions and adding the XOR at the end, my optimizations or a fully 
optimized version. For the third part one needs a lot of real life sample data 
and well defined perfomance comparisons. In my opinion all this 
distracts from the original goal to save disk I/Os on spinning media. 

So the current design is a good balance between performance, simplicity 
code-readability and avoiding spare pages. If you look at the original
implementation it just called gen_syndrome() twice for rmw and even 
with that the numbers where quite impressive.

> Apart from the the code looks quite nice and clean .... though I don't seem
> to be concentrating at my best today so I reserve the right to revise that
> assessment at a later date :-)
> 
> Thanks,
> NeilBrown

Markus
****************************************************************************
Diese E-Mail enthÃ¤lt vertrauliche und/oder rechtlich geschÃ¼tzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtÃ¼mlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Ã?ber das Internet versandte E-Mails kÃ¶nnen unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche WillenserklÃ¤rung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

Vorstand:
Kadir Akin
Dr. Michael HÃ¶hnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht KÃ¶ln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

executive board:
Kadir Akin
Dr. Michael HÃ¶hnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************