Re: Feature request: Add flag for assuming a new clean drive completely dirty when adding to a degraded raid5 array in order to increase the speed of the array rebuild

Jaromír Cápík <jaromir.capik@xxxxxxxx> · Tue, 11 Jan 2022 10:59:20 +0100 (CET)

Hello Roger.

I just run atop on a different and much better hardware doing mdadm --grow on raid5 with 4 drives and it shows the following

DSK | sdl | | busy 90% | read 950  | | write 502 | | KiB/r 1012 | KiB/w 506 | | MBr/s 94.0 | | MBw/s 24.9 | | avq 1.29 | avio 6.22 ms | |
DSK | sdk | | busy 89% | read 968  | | write 499 | | KiB/r 995  | KiB/w 509 | | MBr/s 94.1 | | MBw/s 24.8 | | avq 0.92 | avio 6.09 ms | |
DSK | sdj | | busy 88% | read 1004 | | write 503 | | KiB/r 958  | KiB/w 505 | | MBr/s 94.0 | | MBw/s 24.8 | | avq 0.66 | avio 5.91 ms | |
DSK | sdi | | busy 87% | read 1013 | | write 499 | | KiB/r 949  | KiB/w 509 | | MBr/s 94.0 | | MBw/s 24.8 | | avq 0.65 | avio 5.81 ms | |

Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 
md3 : active raid5 sdi1[5] sdl1[6] sdk1[4] sdj1[2]
      46877237760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=================>...]  resync = 88.5% (13834588672/15625745920) finish=293.1min speed=101843K/sec
      bitmap: 8/59 pages [32KB], 131072KB chunk

Surprisingly all 4 drives show approximately 94MB/s read and 25MB/s write.
Even when each of the drives can read 270MB/s and write 250MB/s, the sync speed is 100MB/s only, so?

Does --grow differ from --add?

Thanks,
Jaromir

---------- Původní e-mail ----------

Od: Roger Heflin <rogerheflin@xxxxxxxxx>

Komu: Wols Lists <antlists@xxxxxxxxxxxxxxx>

Datum: 11. 1. 2022 1:15:17

Předmět: Re: Feature request: Add flag for assuming a new clean drive
 completely dirty when adding to a degraded raid5 array in order to increase
 the speed of the array rebuild

I just did a "--add" with sdd on a raid6 array missing a volume and here is what sar shows:

06:08:12 PM       sdb     91.03  34615.97      0.36      0.00    380.26      0.41      4.47     30.31
06:08:12 PM       sdc      0.02      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:08:12 PM       sdd     77.12     26.28  34563.36      0.00    448.54      0.64      8.23     27.40
06:08:12 PM       sde     36.45  34598.82      0.36      0.00    949.22      1.43     38.78     70.37
06:08:12 PM       sdf     46.87  34598.89      0.36      0.00    738.25      1.23     26.13     57.81

06:09:12 PM       sda      5.12      0.93     75.33      0.00     14.91      0.01      1.48      0.39
06:09:12 PM       sdb    122.57  46819.67      0.40      0.00    382.00      0.54      4.38     35.85
06:09:12 PM       sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:09:12 PM       sdd    105.92      0.00  46775.73      0.00    441.63      1.12     10.53     35.80
06:09:12 PM       sde     48.47  46817.53      0.40      0.00    965.98      1.95     40.00     97.89
06:09:12 PM       sdf     56.95  46834.53      0.40      0.00    822.39      1.73     30.32     82.33

06:10:12 PM       sda      4.55      1.20     48.20      0.00     10.86      0.01      0.97      0.27

06:10:12 PM       sdb    123.67  46616.93      0.40      0.00    376.96      0.52      4.15     34.66
06:10:12 PM       sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:10:12 PM       sdd    109.82      0.00  46623.40      0.00    424.56      1.30     11.80     36.15
06:10:12 PM       sde     49.18  46602.00      0.40      0.00    947.52      1.93     39.17     97.27
06:10:12 PM       sdf     54.88  46601.07      0.40      0.00    849.10      1.75     31.82     85.16

06:11:12 PM       sda      4.07      1.00     50.80      0.00     12.74      0.01      1.77      0.30

06:11:12 PM       sdb    121.93  46363.20      0.40      0.00    380.24      0.51      4.10     34.72
06:11:12 PM       sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:11:12 PM       sdd    109.58      0.00  46372.47      0.00    423.17      1.37     12.44     35.69
06:11:12 PM       sde     49.38  46371.00      0.40      0.00    939.01      1.93     38.88     97.09
06:11:12 PM       sdf     55.12  46352.53      0.40      0.00    841.00      1.73     31.39     85.25

06:12:12 PM       sda      5.75     14.20     79.05      0.00     16.22      0.01      1.78      0.40

06:12:12 PM       sdb    120.73  45994.13      0.40      0.00    380.97      0.51      4.20     34.72
06:12:12 PM       sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:12:12 PM       sdd    110.95      0.00  45982.87      0.00    414.45      1.43     12.81     35.39
06:12:12 PM       sde     49.63  46020.46      0.40      0.00    927.37      1.91     38.39     96.18
06:12:12 PM       sdf     54.27  46022.80      0.40      0.00    847.97      1.75     32.14     86.65

So there are very few reads going on for sdd, but a lot of reads of the other disks to recalculate what the data on that disk.

This is on raid6, but if raid6 is not doing a pointless check read on a new disk add, I would not expect raid5 to be.

This is on a 5.14 kernel.

On Mon, Jan 10, 2022 at 5:15 PM Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote:

On 09/01/2022 14:21, Jaromír Cápík wrote:

> In case of huge arrays (48TB in my case) the array rebuild takes a couple of

> days with the current approach even when the array is idle and during that

> time any of the drives could fail causing a fatal data loss.

> 

> Does it make at least a bit of sense or my understanding and assumptions

> are wrong?

It does make sense, but have you read the code to see if it already does it?

And if it doesn't, someone's going to have to write it, in which case it 

doesn't make sense, not to have that as the default.

Bear in mind that rebuilding the array with a new drive is completely 

different logic to doing an integrity check, so will need its own code, 

so I expect it already works that way.

I think you've got two choices. Firstly, raid or not, you should have 

backups! Raid is for high-availability, not for keeping your data safe! 

And secondly, go raid-6 which gives you that bit extra redundancy.

Cheers,

Wol