On 13/05/2022 17:02, Piergiorgio Sartor wrote:
[Mon May 2 03:36:25 2022] Add. Sense: Unrecovered read error
[Mon May 2 03:36:25 2022] sd 0:0:2:0: [sdc] CDB:
[Mon May 2 03:36:25 2022] Read(10): 28 00 10 56 55 80 00 04 00 00
[Mon May 2 03:36:25 2022] end_request: critical medium error, dev
sdc, sector 274093444
[Mon May 2 04:06:32 2022] md: md0: data-check done.
The error is reported from the device.
As far as I know, and please someone correct
me if I'm wrong, when a device has an error,
"md" tries to re-write the data, using the
redundancy, and, if no error occurs, it just
continues, no reason to kick the device our
of the array.
Correct. If the underlying disk returns an error, raid recovery kicks
in. The missing block is calculated, returned to the caller and written
back to the disk.
There's a whole bunch of reasons how/why this can occur. If it's a
transient failure and the re-write succeeds perfectly, everything is
normally hunky-dory.
There could be a problem with the drive, the drive re-locates the dodgy
sector, and everything APPEARS hunky-dory.
Or the rewrite fails, raid assumes the drive is faulty and kicks it out.
That's why you should never use desktop drives unless you know EXACTLY
what you are doing!
The error message is "critical medium error" - we have a real problem
with the disk I suspect.
FIRST run SMART on the disk and see what that reports. If that's not
happy, REPLACE THE DRIVE PRONTO.
If SMART is happy, run a raid scrub.
And whatever, if you haven't replaced the drive, start monitoring SMART.
If disk errors start climbing, that's a cause for concern and replacing
the drive.
Cheers,
Wol