Re: On URE and RAID rebuild - again!

Mikael Abrahamsson <swmike@xxxxxxxxx> · Mon, 4 Aug 2014 09:02:22 +0200 (CEST)

On Sun, 3 Aug 2014, NeilBrown wrote:

You are very unlikely to see UREs just be reading the drive over and over a
again.  You easily do that for years and not get an error.  Or maybe you got
one just then.

Also you might get an intermittent URE. I have had drives where the sector 
would be successfully be read after several attempts. Why the drive 
doesn't re-write the sector when it needs hundreds or thousands of 
attempts to read it, I don't know. I would very much like to talk to 
someone who really knows how these things works end-to-end, but I don't 
have access to anyone like that. Most of the information to be found 
publically is by people deducing behaviour from experience from the 
outside of this "black box".

2) how UREs should be visible? Via error reporting through dmesg?

If you want to see how the system responds when it hits a URE, you can use the
hdparm command and the "--make-bad-sector" option.  There is also a
"--repair-sector" option which will (hopefully) repair the sector when you
are done.

Does this command do the same as with a real URE, ie will try until the 
timeout of the drive (which is what, 90 seconds on a consumer drive, 7 
seconds of an enterprise drive, right?).

If it fails immediately then it's not testing the same thing as a "real" 
URE. Might be good to know if one does testing that's supposed to emulate 
real failures.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html