Re: RAID-1 can (sometimes) be 3x faster than RAID-10

Robin Hill <robin@xxxxxxxxxxxxxxx> · Thu, 30 May 2019 21:19:41 +0100

On Thu May 30, 2019 at 10:04:56PM +0200, keld@xxxxxxxxxx wrote:

> On Thu, May 30, 2019 at 06:08:53PM +0000, Andy Smith wrote:
> > Hi keld,
> > 
> > Thanks for the reply.
> > 
> > On Thu, May 30, 2019 at 12:04:20PM +0200, keld@xxxxxxxxxx wrote:
> > > you need to clarify which layout you use with md raid10.
> > 
> > I did not bother as I included the commands for the array setup
> > which should indicate that default layout was used.
> 
> 
> yes it did. but it was hidden way down in the extended article.
> 
>  
> > > the layouts are near, far and offset, with very different performance characteristics.
> > 
> > I did not think these would be of any interest on SSD/NVMe which is
> > my main concern and is the area where RAID-1 outperforms RAID-1 by a
> > factor of 3 for 100% 4KiB random reads.
> 
> i think the latter raid-1 should read "md raid10,near".
> yes that is indeed strange, and probably due to the code being written with HDs in mind.
> 
> > 
> > > far and offset are designed to be faster than near, which I understand that you use.
> > > So why are you using the slowest md raid10 layout, and not mentioning this fact?
> > 
> > Because I did not see the point of a non-default layout for fast
> > flash devices.
>  
> 
> i can understand your pow, but due to  differences in the drivers it may actually matter.
> 
> and maybe we can optimize the code a little for ssds.
> I have in mind some patches for the far layout, where the higher blocks are actually
> faster than the lower blocks. is this also true for ssds?
> 
No - there's not even any direct connection between offset you're
writing to and the block on the drive that's written. The drive firmware
remaps everything dynamically for wear levelling and to avoid erasing
blocks during a write cycle (as the erase is slow).

> 
> > > maybe you could run your tests for all 3 layouts?
> > 
> > Yes I will be happy to do this and see what happens but I'm not
> > optimistic that it will change matters so that RAID-10 is able to
> > direct most reads to the fastest half.
> 
> which is the fastest half? does that apply to all ssds/nvme?
> 
"fastest half" means the fastest half of the mirror - the NVMe drive, as
opposed to the slower SSD.

I suspect the slowdown is because there's no optimisation for the
2-drive RAID-10 case, so it can't assume that all data is available on
any drive - it therefore just cycles through the array members and
issues the next read to the next drive each time. With RAID-1 it can
always issue the next read to the first available drive (as each copy
contains all the data) and therefore take advantage of the NVMe
performance.

Cheers,
    Robin