On 15/05/2022 19:39, Pascal Hambourg wrote:
Le 14/05/2022 à 15:46, Wols Lists a écrit :
Or the rewrite fails, raid assumes the drive is faulty and kicks it
out. That's why you should never use desktop drives unless you know
EXACTLY what you are doing!
What's wrong with desktop drives ?
Once things start going wrong, they go pear-shaped very fast.
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
tl;dr
Raid/Enterprise drives have something called SCT/ERC. If there's a
problem, the drive will abort the read/write, and return an error.
Consumer drives don't have this. If there's a problem, they can
typically take two minutes to respond. No matter whether the problem is
transient or real, that's a real bummer for whatever wants the data. The
kernel typically gives up waiting after 30secs, tries to talk to the
drive again, and on getting no response whatsoever assumes the disk has
failed. As far as raid is concerned, a faulty, non-responsive disk is
BAD NEWS.
It gets worse. SMR drives can - in the NORMAL course of events, take
about ten minutes to respond!
So basically, Enterprise drives typically take about 7 seconds to sort
out a problem. Consumer drives - the old CMR type - typically take about
2 minutes. New SMR drives can take 10s of minutes. And transient
problems aren't that uncommon. Worse, once things start going wrong, it
can explode very fast.
Cheers,
Wol