Am 02.11.23 um 13:29 schrieb eyal@xxxxxxxxxxxxxx:
See update further down.
Interestingly, after about 1.5 hours, when there were 1GB of dirty
blocks, the whole lot was cleared fast:
2023-11-02 23:08:49 Dirty: 1018924 kB
2023-11-02 23:08:59 Dirty: 1018640 kB
2023-11-02 23:09:09 Dirty: 1018732 kB
2023-11-02 23:09:19 Dirty: 592196 kB
2023-11-02 23:09:29 Dirty: 1188 kB
2023-11-02 23:09:39 Dirty: 944 kB
2023-11-02 23:09:49 Dirty: 804 kB
2023-11-02 23:09:59 Dirty: 60 kB
And iostat saw it too:
Device tps kB_read/s kB_wrtn/s
kB_dscd/s kB_read kB_wrtn kB_dscd
23:09:12 md127 2.80 0.00 40.40
0.00 0 404 0
23:09:22 md127 1372.33 0.80 47026.17
0.00 8 470732 0
23:09:32 md127 75.80 0.80 54763.20
0.00 8 547632 0
23:09:42 md127 0.00 0.00 0.00
0.00 0 0 0
it's pretty easy: RAID6 behaves terrible in degraded state especially
*with rotating disks* and for the sake of god as long it is degraded and
not fully rebuilt you should avoid any load which isn't strictly necessary
the chance that another disk dies is increasing especially in the
rebuild-phase and then start to pray becuase the next unrecoverable read
error will kill the array
a RAID10 couldn't care less at that point because it don't need to seek
like crazy on the drives
---------
what i don't understand is why people don't have replacement disks in
the shelf for every array they operate, replace the drive and leave it
in peace until the rebuild is finished
i am responsible for 7 machines at 5 locations with mdadm RAID of
different sizes and there is a replacement disk for each of them - if a
disk dies or smartd complains it's replaced and the next drive will be
ordered