Re: best base / worst case RAID 5,6 write speeds

Phil Turmel <philip@xxxxxxxxxx> · Thu, 17 Dec 2015 17:40:48 -0500

On 12/17/2015 04:08 PM, Dallas Clement wrote:
> I am still in the process of collecting a bunch of performance data.
> But so far, it is shocking to see the throughput difference when
> blocks written are stripe aligned.

Random writes unaligned has at least a 4x multiplier on raid5 and 6x on
raid6 per my earlier explanation.  Why does this surprise you?  It's
parity raid.  This is why users with heavy random workloads are pointed
at raid1 and raid10.  I like raid10,f3 for VM host images and databases.

> However, in the non-ideal world it
> is not always possible to ensure that clients are writing blocks of
> data which are stripe aligned.

Hardly possible at all, except for bulk writes of large media files, and
then only if you are writing one stream at a time to an otherwise idle
storage stack.  Not very realistic in a general-purpose storage
appliance.  "General purpose" just isn't very sequential.

> If the goal is to reduce the # of RMWs
> it seems like writing big blocks would also help for sequential
> workloads where large quantities of data are being written.

The goal is to be able to read later what you need to write now.  Unless
you have unlimited $ to spend, you have to balance speed, redundancy,
and capacity.  As they say, pick two.

Lots of spindles is generally good.  Raid5 is great for capacity, good
for redundancy, and marginal for speed.  Raid6 is great for capacity,
great for redundancy, and pitiful for speed.  Raid10,f2 is great for
speed, poor for capacity, and good for redundancy.  Raid10,f3 is great
for speed, pitiful for capacity, and great for redundancy.

> Can any
> of you think of anything else that can be tuned in the kernel to
> reduce # of RMWs in the case where blocks are not stripe aligned?  Is
> it a bad idea to mess with the timing of the stripe cache?

You can't really hold those writes for long, as any serious application
is going to call fdatasync at short intervals, for algorithmic integrity
reasons.  On random workloads, you simply have no choice but to do RMWs.
 Your only out is to make complete chunk stripes smaller than your
application's typical write size.  That raises the odds that any
particular write will be aligned or mostly aligned.  Have you tried 4k
chunks?

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html