Re: unexpected speed difference at RAID initialisation

NeilBrown <neilb@xxxxxxx> · Wed, 13 May 2015 13:34:42 +1000

On Sat, 09 May 2015 00:47:19 +0200 Christoph Anton Mitterer
<calestyo@xxxxxxxxxxxx> wrote:

> Hey.
> 
> I'm just deploying some new servers at the faculty, where I made the
> following strange observation, which I cannot explain.
> 
> All nodes have exactly the same hardware (all recent stuff, some 15k€
> Dell servers with 16 disks á 6 TB, plenty of CPU), the same
> BIOS/firmware config, the same OS (Debian jessie, except the kernel
> 4.0.0 from experimental and btrfs-tools 4.0 from sid) with identical
> config.
> 
> The discs are connected via some Dell PERC RAID controller but for
> testing they're exported as JBODs.
> 
> Nothing except some standard daemons (haveged, irqbalance and that like)
> are running on these nodes.
> 
> 
> I created an MD RAID6 over all disks via:
> mdadm --create /dev/md/data-test-raid --verbose --metadata=1.2
> --size=max --chunk=512K --level=raid6 --bitmap=internal
> --name=data-test-raid --raid-devices=16
> --spare-devices=0 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp
> basically at the same time (few seconds difference) on both nodes.
> 
> 
> But looking at the initial rebuild on two nodes, one can see substantial
> speed differences:
> node A: 
> # cat /proc/mdstat 
> Personalities : [raid6] [raid5] [raid4] 
> md127 : active raid6 sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
>       82045479936 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
>       [=====>...............]  resync = 28.8% (1691996416/5860391424) finish=841.8min speed=82526K/sec
>       bitmap: 32/44 pages [128KB], 65536KB chunk
> 
> unused devices: <none>
> 
> node B: 
> # cat /proc/mdstat 
> Personalities : [raid6] [raid5] [raid4] 
> md127 : active raid6 sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
>       82045479936 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
>       [====>................]  resync = 20.1% (1180137984/5860391424) finish=1496.2min speed=52132K/sec
>       bitmap: 36/44 pages [144KB], 65536KB chunk
> 
> unused devices: <none>
> 
> (again taken with only few seconds in between).
> 
> As you can see it already shows different speed (~80000K/s for node A
> and ~50000K/s for node B).

This could be quite normal.

When resyncing an array, md will read a full stripe and check if it is in
sync. If it is, it just continues.
If not, it schedules a write with the correct parity.

So if the array happens to be fully in-sync, this goes at maximum see
(sequential reads only).
If it is totally out of sync, it goes very slowly (read/write/read/write...).

What you are probably seeing is that all the drives in one array are (almost)
completely zero, so it seems in-sync.  In the other array, one drive might
have old data on it so syncing is needed.

(snip)

> 
> There's a bigger difference at:
> [ 8334.341904] raid6: using algorithm avx2x4 (24764 MB/s)
> vs.
> [ 8269.712632] raid6: using algorithm avx2x4 (25242 MB/s)
> 
> How are these numbers determined?

Holding a finger up to the wind and seeing how cold it gets.

Well, it really runs a loop performing the calculation over and over and
times how long that takes.
It doesn't take cache effects into account properly so it isn't completely
reliable, but it is reasonably indicative.

> 
> 
> Further:
> node A:
> # ps ax  | grep md127
>   7550 ?        S    273:47 [md127_raid6]
>   7552 ?        D     79:43 [md127_resync]
> 
> 
> node B:
> # ps ax  | grep md127
>   7494 ?        R    251:30 [md127_raid6]
>   7495 ?        D     63:48 [md127_resync]
> 
> 
> 
> Any ideas where this performance difference could come from?

Probably just different initial content of devices.

If it is still going, you could look at the IO stats.  One array might see a
lot more writes than the other.

NeilBrown
Attachment:
pgpq6hWXbMdOL.pgp

Description: OpenPGP digital signature