Re: failed sector detected but disk still active ?

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Sat, 14 May 2022 14:46:37 +0100

On 13/05/2022 17:02, Piergiorgio Sartor wrote:
[Mon May  2 03:36:25 2022] Add. Sense: Unrecovered read error
[Mon May  2 03:36:25 2022] sd 0:0:2:0: [sdc] CDB:
[Mon May  2 03:36:25 2022] Read(10): 28 00 10 56 55 80 00 04 00 00
[Mon May  2 03:36:25 2022] end_request: critical medium error, dev
sdc, sector 274093444
[Mon May  2 04:06:32 2022] md: md0: data-check done.
The error is reported from the device.

As far as I know, and please someone correct
me if I'm wrong, when a device has an error,
"md" tries to re-write the data, using the
redundancy, and, if no error occurs, it just
continues, no reason to kick the device our
of the array.

Correct. If the underlying disk returns an error, raid recovery kicks 
in. The missing block is calculated, returned to the caller and written 
back to the disk.

There's a whole bunch of reasons how/why this can occur. If it's a 
transient failure and the re-write succeeds perfectly, everything is 
normally hunky-dory.

There could be a problem with the drive, the drive re-locates the dodgy 
sector, and everything APPEARS hunky-dory.

Or the rewrite fails, raid assumes the drive is faulty and kicks it out. 
That's why you should never use desktop drives unless you know EXACTLY 
what you are doing!

The error message is "critical medium error" - we have a real problem 
with the disk I suspect.

FIRST run SMART on the disk and see what that reports. If that's not 
happy, REPLACE THE DRIVE PRONTO.

If SMART is happy, run a raid scrub.

And whatever, if you haven't replaced the drive, start monitoring SMART. 
If disk errors start climbing, that's a cause for concern and replacing 
the drive.

Cheers,
Wol