Re: My array won't assemble

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Fri, 21 Dec 2018 00:45:57 +0000

On 20/12/18 17:21, Alexis BRENON wrote:
> You recommend to change the WD (which do not support ERT) instead of
> the faulty SG?

> Sorry for my poor English, but what do you mean by: "you're probably
> fine to do a scrub"?
> 
By changing the timeout on the WD to 180, you have just told the kernel
to wait for up to three minutes if the WD has a problem. That is
probably going to cause users of the computer a lot of grief should such
a problem occur.

Is the Seagate faulty? The official spec says that you should EXPECT one
problem every ten terabytes. I don't know how much usage this drive has
had, and in real life you would normally get much further without a
problem, but that is the manufacturer's guarantee. You read the
explanation of what the timeout problem is? That's what probably brought
the array down, and until it's back the raid code can't try and fix it.
So it's *likely* that once the array is back up, this problem will go
away. Read errors are not a sign of a failing drive.

I've just looked at your smartctl output, and I think you need to do a
"-x" on the Seagate. Look at the output from my Barracuda ...

https://raid.wiki.kernel.org/index.php/Drive_Data_Sheets#ST3000DM001_.282014.29_3_TB

Note that it says "SCT Error Recovery Control command not supported" :-)

But what you want to look for is lines like

  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0

In other words, the drive hasn't had to do any real recovery, and is
healthy - all the sectors are where they should be, with no fancy
jiggery-pokery to try and keep your data safe. You need to distinguish
between the odd hiccup - which is what your read error *probably* is -
and a serious problem where the drive is beginning to have to work hard
to keep your data safe.

Scrubbing?

https://raid.wiki.kernel.org/index.php/Scrubbing_the_drives

Cheers,
Wol