Re: Raid 1 vs Raid 10 single thread performance

Bostjan Skufca <bostjan@xxxxxx> · Tue, 16 Sep 2014 09:48:28 +0200

David and Neil, thanks for hints!

(I was busy with other things lately, but believe it or not I got the
"why not try raid 10 with only 2 partitions" idea just last night,
tested it a couple of minutes ago with fascination, and now here I am
reading your emails - please do not remind me again of time wasted :)

The write performance is curious though:
- f2: 147 MB/s
- n2: 162 MB/s
I was expecting greater difference (bu I must admit this was not
tested on the whole 3TB disk, just 400GB partition on it).

b.

On 12 September 2014 10:49, David Brown <david.brown@xxxxxxxxxxxx> wrote:
> On 10/09/14 23:24, Bostjan Skufca wrote:
>> Hi,
>>
>> I have a simple question:
>> - Where is the code that is used for actual RAID 10 creation? In
>> kernel or in mdadm?
>>
>>
>> Explanation:
>>
>> I was dissatisfied with single-threaded RAID 1 sequential read
>> performance (basically boils down to the speed of one disk). I figured
>> that instead of using level 1 I could create RAID level 10 and use two
>> equally-sized partitions on each drive (instead of one).
>>
>> It turns out that if array is created properly, it is capable of
>> sequential reads at almost 2x single device speed, as expected (on
>> SSD!) and what would anyone expect from ordinary RAID 1.
>>
>> What does "properly" actually mean?
>> I was doing some benchmarks with various raid configurations and
>> figured out that the order of devices submitted to creation command is
>> significant. It also makes raid10 created in such mode reliable or
>> unreliable to a device failure (not partition failure, device failure,
>> which means that two raid underlying devices fail at once).
>>
>> Sum:
>> - if such array is created properly, it has redundancy in place and
>> performs as expected
>> - if not, it performs as raid1 and fails with one physical disk failure
>>
>> I am trying to find the code responsible for creation of RAID 10 in
>> order to try and make it more inteligent about where to place RAID 10
>> parts if it gets a list of devices to use, and some of those devices
>> are on the same physical disks.
>>
>> Thanks for hints,
>> b.
>>
>>
>>
>> PS: More details about testing is available here, but be warned, it is
>> still a bit hectic to read:
>> http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/
>
>
> Hi,
>
> First let me applaud your enthusiasm for trying to inform people about
> raid in your blog, your interest in investigating different ideas in the
> hope of making md raid faster and/or easier and/or safer.
>
> Then let me tell you your entire blog post is wasted, because md already
> has a solution that is faster, easier and safer than anything you have
> come up with so far.
>
> You are absolutely correct about the single-threaded read performance of
> raid1 pairs - for a number of reasons, a single thread read will get
> reads from only one disk.  This is not a problem in many cases, because
> you often have multiple simultaneous reads on "typical" systems with
> raid1.  But for some cases, such as a high performance desktop, it can
> be a limitation.
>
> You are also correct that the solution is basically to split the drives
> into two parts, pair up halves from each disk as raid1 mirrors, and
> stripe the two mirrors as raid0.
>
> And you are correct that you have to get the sets right, or you will may
> lose redundancy and/or speed.
>
> Fortunately, Neil and the other md raid developers are way ahead of you.
>
> Neil gave you the pointers in one of his replies, but I suspect you did
> not understand that Linux raid10 is not limited to the arrangement of
> traditional raid10, and thus did not see his point.
>
> md raid and mdadmin already support a very flexible form of raid10.
> Unlike traditional raid10 that requires a multiple of 4 disks, Linux
> raid10 can work with /any/ number of disks greater than 1.  There are
> various layouts that can be used for this - the Wikipedia entry gives
> some useful diagrams:
>
> <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
>
> You can also read about it in the mdadm manual page, and various
> documents and resources around the web.
>
>
> In your particular case, what you want is to use "--layout raid10,f2" on
> your two disks.  This asks md to split each disk (or the partitions you
> use) into two parts, without creating any new partitions.  The first
> half of disk 1 is mirrored with the second half of disk 2, and vice
> versa, then these mirrors are striped.  This is very similar to the
> layout you are trying to achieve, except for four points:
>
> The mirrors are crossed-over, so that a first half is mirrored with a
> second half.  This makes no difference on an SSD, but makes a huge
> difference on a hard disk.
>
> mdadm and md raid get the ordering right every time - there is no need
> to worry about the ordering of the two disks.
>
> You don't have to have extra partitions, automatic detection works, and
> the layout has one less layer, meaning less complexity and lower latency
> and overheads.
>
> md raid knows more about the layout, and can use it to optimise the speed.
>
>
> In particular, md will (almost) always read from the outer halves of the
> disks.  On a hard disk, this can be twice the speed of the inner layers.
>
> Obviously you pay a penalty in writing when you have such an arrangement
> - writes need to go to both disks, and involve significant head
> movement.  There are other raid10 layouts that have lower streamed read
> speeds but also lower write latencies (choose the balance you want).
>
>
> With this in mind, I hope you can try out raid10,f2 layout on your
> system and then change your blog to show how easy this all is with md
> raid, how practical it is for a fast workstation or desktop, and how
> much faster such a setup is than anything that can be achieved with
> hardware raid cards or anything other than md raid.
>
> mvh.,
>
> David
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html