Re: Feature request: Add flag for assuming a new clean drive completely dirty when adding to a degraded raid5 array in order to increase the speed of the array rebuild

Jaromír Cápík <jaromir.capik@xxxxxxxx> · Mon, 10 Jan 2022 14:38:42 +0100 (CET)

Nope, I haven't read the code. I only see a low sync speed (fluctuating from 20
to 80MB/s) whilst the drives can perform much better doing sequential reading
and writing (250MB/s per drive and up to 600MB/s all 4 drives in total).
During the sync I hear a high noise caused by heads flying there and back and
that smells.
The chosen drives have poor seeking performance and small caches and are
probably unable to reorder the operations to be more sequential. The whole
solution is 'economic' since the organisation owning the solution is poor and
cannot afford better hardware.
That also means RAID6 is not an option. But we shouldn't search excuses what's
wrong on the chosen scenario when the code is potentially suboptimal :] We're
trying to make Linux better, right? :]

I'm searching for someone, who knows the code well and can confirm my findings
or who could point me at anything I could try in order to increase the rebuild
speed. So far I've tried changing the readahead, minimum resync speed, stripe
cache size, but it increased the resync speed by few percent only.

I believe I would be able to write my own userspace application for rebuilding
the array offline with much higher speed ... just doing XOR of bytes at the same
offsets. That would prove the current rebuild strategy is suboptimal.

Of course it would mean a new code if it doesn't work like suggested and I know
it could be difficult and requiring a deep knowledge of the linux-raid code that
unfortunately I don't have.

Any chance someone here could find time to look at that?

Thank you,
Jaromir Capik

On 09/01/2022 14:21, Jaromír Cápík wrote:

>> In case of huge arrays (48TB in my case) the array rebuild takes a couple of
>> days with the current approach even when the array is idle and during that
>> time any of the drives could fail causing a fatal data loss.
>> 
>> Does it make at least a bit of sense or my understanding and assumptions
>> are wrong?
>
>It does make sense, but have you read the code to see if it already does it?
>And if it doesn't, someone's going to have to write it, in which case it 
>
>doesn't make sense, not to have that as the default.
>
>Bear in mind that rebuilding the array with a new drive is completely 
>
>different logic to doing an integrity check, so will need its own code, 
>
>so I expect it already works that way.
>
>
>I think you've got two choices. Firstly, raid or not, you should have 
>
>backups! Raid is for high-availability, not for keeping your data safe! 
>
>And secondly, go raid-6 which gives you that bit extra redundancy.
>Cheers,
>
>Wol