Re: Triple-parity raid6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/06/2011 05:22, Namhyung Kim wrote:
NeilBrown<neilb@xxxxxxx>  writes:
On Thu, 09 Jun 2011 13:32:59 +0200 David Brown<david@xxxxxxxxxxxxxxx>  wrote:

You can see the current kernel code at:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=lib/raid6;h=970c541a452d3b9983223d74b10866902f1a47c7;hb=HEAD


int.uc is the generic C code which 'unroll.awk' processes to make various
versions that unroll the loops different amounts to work with CPUs with
different numbers of registers.
Then there is sse1, sse2, altivec which provide the same functionality in
assembler which is optimised for various processors.

And 'recov' has the smarts for doing the reverse calculation when 2 data
blocks, or 1 data and P are missing.

Even if you don't feel up to implementing everything, a start might be
useful.  You never know when someone might jump up and offer to help.


Maybe I could help David to some extent. :)

I'm gonna read the raid6 code next week and hope that there is a room I
can help with, FWIW.


No matter how far I manage to get, I'm going to need help from someone who knows the system and the development process, how the new functions would fit with the rest of the system, and not least people who can check and test the code.

Making multiple parity syndromes is easy enough mathematically:

For each parity bit P_j, you have a generator g_j and calculate for d_i running over all data disks:

	P_j = sum((g_j ^ i) . d_i)

Raid5 parity uses g_0 = 1, so it is just the xor.
Raid6 uses g_0 = 1, g_1 = 2.

Any independent generators could be used, of course. I am not sure how to prove that a set of generators is independent (except in the easy case of 2 generators, as shown in the raid6.pdf paper) - but brute force testing over all choices of dead disks is easy enough.

For Raid7, the obvious choice is g_2 = 4 - then we can re-use existing macros and optimisations.


I've had a brief look at int.uc - the gen_syndrome function is nice and small, but that's before awk gets at it. I haven't yet tried building any of this, but extending raid6_int to a third syndrome is, I hope, is straightforward:

static void raid7_int$#_gen_syndrome(int disks, size_t bytes, void **ptrs)
{
        u8 **dptr = (u8 **)ptrs;
        u8 *p, *q, *r;
        int d, z, z0;

        unative_t wd$$, wq$$, wp$$, w1$$, w2$$, wr$$, w3$$, w4$$;

        z0 = disks - 4;         /* Highest data disk */
        p = dptr[z0+1];         /* XOR parity */
        q = dptr[z0+2];         /* RS syndrome */
        r = dptr[z0+3];         /* RS syndrome 2 */

        for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
                wr$$ = wq$$ = wp$$ = *(unative_t *)&dptr[z0][d+$$*NSIZE];
                for ( z = z0-1 ; z >= 0 ; z-- ) {
                        wd$$ = *(unative_t *)&dptr[z][d+$$*NSIZE];
                        wp$$ ^= wd$$;
                        w2$$ = MASK(wq$$);
                        w1$$ = SHLBYTE(wq$$);
                        w2$$ &= NBYTES(0x1d);
                        w1$$ ^= w2$$;
                        wq$$ = w1$$ ^ wd$$;

                        w4$$ = MASK(wr$$);
                        w3$$ = SHLBYTE(wr$$);
                        w4$$ &= NBYTES(0x1d);
                        w3$$ ^= w4$$;
                        w4$$ = MASK(w3$$);
                        w3$$ = SHLBYTE(w3$$);
                        w4$$ &= NBYTES(0x1d);
                        w3$$ ^= w4$$;
                        wr$$ = w3$$ ^ wd$$;
                }
                *(unative_t *)&p[d+NSIZE*$$] = wp$$;
                *(unative_t *)&q[d+NSIZE*$$] = wq$$;
                *(unative_t *)&r[d+NSIZE*$$] = wr$$;
        }
}


I wrote the wr$$ calculations using a second set of working variables. I don't know (yet) what compiler options are used to generate the target code, nor what the awk'ed code looks like. If the compiler can handle the scheduling to interlace the Q and R calculations and reduce pipeline delays, then that's great. If not, then they can be manually interlaced if it helps the code. But as I'm writing this blind (on a windows machine, no less), I don't know if it makes a difference.

As a general point regarding optimisations, is there a minimum level of gcc we can expect to have here? And are there rules about compiler flags? Later versions of gcc are getting very smart about vectorisation, and generating multiple versions of code that are selected automatically at run-time depending on the capabilities of the processor. My hope is that the code can be greatly simplified by writing a single clear version in C that runs fast and takes advantage of the cpu, without needing to make explicit SSE, AtliVec, etc., versions. Are there rules about which gcc attributes or extensions we can use?


Another question - what should we call this? Raid 7? Raid 6.3? Raid 7.3? While we are in the mood for triple-parity Raid, we can easily extend it to quad-parity or more - the name should probably be flexible enough to allow that.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux