Re: [PATCH] raid5: add support for rmw writes in raid6

David Brown <david.brown@xxxxxxxxxxxx> · Mon, 29 Apr 2013 21:28:24 +0200

On 29/04/13 18:11, Kumar Sundararajan wrote:

On 4/29/13 12:10 AM, "David Brown" <david.brown@xxxxxxxxxxxx> wrote:

On 29/04/13 03:29, NeilBrown wrote:
On Fri, 26 Apr 2013 14:35:27 -0700 Dan Williams <djbw@xxxxxx> wrote:

On Sun, Apr 21, 2013 at 9:22 PM, NeilBrown <neilb@xxxxxxx> wrote:
On Wed, 17 Apr 2013 17:56:56 -0700 Dan Williams <djbw@xxxxxx> wrote:

From: Kumar Sundararajan <kumar@xxxxxx>

Add support for rmw writes in raid6. This improves small write
performance for most workloads.
Since there may be some configurations and workloads where rcw
always outperforms rmw,
this feature is disabled by default. It can be enabled via sysfs.
For example,

       #echo 1 > /sys/block/md0/md/enable_rmw

Signed-off-by: Kumar Sundararajan <kumar@xxxxxx>
Signed-off-by: Dan Williams <djbw@xxxxxx>
---
Hi Neil,

We decided to leave the enable in for the few cases where forced-rcw
outperformed rmw and there may be other cases out there.

Thoughts?

- More commentary would help.  The text at the top should explain
enough so
   when I read the code I am just verifying the text at the top, not
trying to
   figure out how it is supposed to work.

Ok, along the lines of:

"raid5 has supported sub-stripe writes by computing:

P_new = P_old + D0_old + D0_new

The '0' looks out of place.  All 'D's are treated the same, so I would
write
it
    P_new = P_old + D_old + D_new
though that is a small point.

For raid6 we add the following:

P_new = P_old + D0_old + D0_new

Correct (though I agree with Neil that "0" should be replaced by "i").

Q_sub = Q_old + syndrome(D0_old)

Wrong - using this terminology, Q_sub = syndrome(D0_old)

Q_sub = Q_old + g^0*D0_old
Q_new = Q_old + Q_sub + g^0*D0_new

But here the different Ds are different, so I would be using "Di" or
similar.

   P_new = P_old + Di_old + Di_new

Correct.

and I assume the two 'Q_sub' lines are meant to mean the same thing, so
let's
take the second:

   Q_sub = Q_old + g^i * Di_old
   Q_new = Q_old + Q_sub + g^i * Di_new

You've still got one "Q_old" too many, which means your algebra is wrong
(Q_new reduces here to g^i * Di_new, which is clearly wrong).  But
you've jumped to almost the right conclusion on the next line, proving
that two wrongs sometimes do make a right!

which looks odd as when that expands out, we add Q_old to Q_old and as
'+'
is 'xor', it disappears.  Maybe you mean:

   Q_new = Q_sub + g^i * Di_new
??

Q_sub = g^i * Di_old
Q_new = Q_old - Q_sub + g^i * Di_new
      = Q_old + g^i * Di_old + g^i * Di_new   (since "-" = "+")
      = Q_old + g^i * (Di_old + Di_new)

I can't say for sure (I know the maths, not the code), but I would
expect this final version do be the most efficient for the calculations.
By xor'ing Di_old and Di_new first, then multiplying by g^i, you'll
half the number of costly g^i multiplications.

Yes, I did consider this. But this scheme is harder to implement with how
the rest of the code works,
especially with writes covering more than one Di.

For each data block you are changing, you will need to remove the old 
g^i * Di_old then add in the new g^i * Di_new, so you can still use this 
simplification to reduce the number of multiplies.  If you want to 
change blocks "i" and "j", you thus do:

Q_new = Q_old + g^i * (Di_old + Di_new) + g^j * (Dj_old + Dj_new)

But as I say, I only know the maths - not the code.

mvh.,

David

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html