Re: problem with recovered array

Reindl Harald <h.reindl@xxxxxxxxxxxxx> · Thu, 2 Nov 2023 13:46:35 +0100

Am 02.11.23 um 13:29 schrieb eyal@xxxxxxxxxxxxxx:
See update further down.

Interestingly, after about 1.5 hours, when there were 1GB of dirty 
blocks, the whole lot was cleared fast:

2023-11-02 23:08:49 Dirty:           1018924 kB
2023-11-02 23:08:59 Dirty:           1018640 kB
2023-11-02 23:09:09 Dirty:           1018732 kB
2023-11-02 23:09:19 Dirty:            592196 kB
2023-11-02 23:09:29 Dirty:              1188 kB
2023-11-02 23:09:39 Dirty:               944 kB
2023-11-02 23:09:49 Dirty:               804 kB
2023-11-02 23:09:59 Dirty:                60 kB

And iostat saw it too:
          Device             tps    kB_read/s    kB_wrtn/s    
kB_dscd/s    kB_read    kB_wrtn    kB_dscd
23:09:12 md127             2.80         0.00        40.40         
0.00          0        404          0
23:09:22 md127          1372.33         0.80     47026.17         
0.00          8     470732          0
23:09:32 md127            75.80         0.80     54763.20         
0.00          8     547632          0
23:09:42 md127             0.00         0.00         0.00         
0.00          0          0          0

it's pretty easy: RAID6 behaves terrible in degraded state especially 
*with rotating disks* and for the sake of god as long it is degraded and 
not fully rebuilt you should avoid any load which isn't strictly necessary

the chance that another disk dies is increasing especially in the 
rebuild-phase and then start to pray becuase the next unrecoverable read 
error will kill the array

a RAID10 couldn't care less at that point because it don't need to seek 
like crazy on the drives

---------

what i don't understand is why people don't have replacement disks in 
the shelf for every array they operate, replace the drive and leave it 
in peace until the rebuild is finished

i am responsible for 7 machines at 5 locations with mdadm RAID of 
different sizes and there is a replacement disk for each of them - if a 
disk dies or smartd complains it's replaced and the next drive will be 
ordered