On 10/06/2011 05:22, Namhyung Kim wrote:
NeilBrown<neilb@xxxxxxx> writes:
On Thu, 09 Jun 2011 13:32:59 +0200 David Brown<david@xxxxxxxxxxxxxxx> wrote:
You can see the current kernel code at:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=lib/raid6;h=970c541a452d3b9983223d74b10866902f1a47c7;hb=HEAD
int.uc is the generic C code which 'unroll.awk' processes to make various
versions that unroll the loops different amounts to work with CPUs with
different numbers of registers.
Then there is sse1, sse2, altivec which provide the same functionality in
assembler which is optimised for various processors.
And 'recov' has the smarts for doing the reverse calculation when 2 data
blocks, or 1 data and P are missing.
Even if you don't feel up to implementing everything, a start might be
useful. You never know when someone might jump up and offer to help.
Maybe I could help David to some extent. :)
I'm gonna read the raid6 code next week and hope that there is a room I
can help with, FWIW.
No matter how far I manage to get, I'm going to need help from someone
who knows the system and the development process, how the new functions
would fit with the rest of the system, and not least people who can
check and test the code.
Making multiple parity syndromes is easy enough mathematically:
For each parity bit P_j, you have a generator g_j and calculate for d_i
running over all data disks:
P_j = sum((g_j ^ i) . d_i)
Raid5 parity uses g_0 = 1, so it is just the xor.
Raid6 uses g_0 = 1, g_1 = 2.
Any independent generators could be used, of course. I am not sure how
to prove that a set of generators is independent (except in the easy
case of 2 generators, as shown in the raid6.pdf paper) - but brute force
testing over all choices of dead disks is easy enough.
For Raid7, the obvious choice is g_2 = 4 - then we can re-use existing
macros and optimisations.
I've had a brief look at int.uc - the gen_syndrome function is nice and
small, but that's before awk gets at it. I haven't yet tried building
any of this, but extending raid6_int to a third syndrome is, I hope, is
straightforward:
static void raid7_int$#_gen_syndrome(int disks, size_t bytes, void **ptrs)
{
u8 **dptr = (u8 **)ptrs;
u8 *p, *q, *r;
int d, z, z0;
unative_t wd$$, wq$$, wp$$, w1$$, w2$$, wr$$, w3$$, w4$$;
z0 = disks - 4; /* Highest data disk */
p = dptr[z0+1]; /* XOR parity */
q = dptr[z0+2]; /* RS syndrome */
r = dptr[z0+3]; /* RS syndrome 2 */
for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
wr$$ = wq$$ = wp$$ = *(unative_t *)&dptr[z0][d+$$*NSIZE];
for ( z = z0-1 ; z >= 0 ; z-- ) {
wd$$ = *(unative_t *)&dptr[z][d+$$*NSIZE];
wp$$ ^= wd$$;
w2$$ = MASK(wq$$);
w1$$ = SHLBYTE(wq$$);
w2$$ &= NBYTES(0x1d);
w1$$ ^= w2$$;
wq$$ = w1$$ ^ wd$$;
w4$$ = MASK(wr$$);
w3$$ = SHLBYTE(wr$$);
w4$$ &= NBYTES(0x1d);
w3$$ ^= w4$$;
w4$$ = MASK(w3$$);
w3$$ = SHLBYTE(w3$$);
w4$$ &= NBYTES(0x1d);
w3$$ ^= w4$$;
wr$$ = w3$$ ^ wd$$;
}
*(unative_t *)&p[d+NSIZE*$$] = wp$$;
*(unative_t *)&q[d+NSIZE*$$] = wq$$;
*(unative_t *)&r[d+NSIZE*$$] = wr$$;
}
}
I wrote the wr$$ calculations using a second set of working variables.
I don't know (yet) what compiler options are used to generate the target
code, nor what the awk'ed code looks like. If the compiler can handle
the scheduling to interlace the Q and R calculations and reduce pipeline
delays, then that's great. If not, then they can be manually interlaced
if it helps the code. But as I'm writing this blind (on a windows
machine, no less), I don't know if it makes a difference.
As a general point regarding optimisations, is there a minimum level of
gcc we can expect to have here? And are there rules about compiler
flags? Later versions of gcc are getting very smart about
vectorisation, and generating multiple versions of code that are
selected automatically at run-time depending on the capabilities of the
processor. My hope is that the code can be greatly simplified by
writing a single clear version in C that runs fast and takes advantage
of the cpu, without needing to make explicit SSE, AtliVec, etc.,
versions. Are there rules about which gcc attributes or extensions we
can use?
Another question - what should we call this? Raid 7? Raid 6.3? Raid
7.3? While we are in the mood for triple-parity Raid, we can easily
extend it to quad-parity or more - the name should probably be flexible
enough to allow that.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html