Re: On URE and RAID rebuild - again!

NeilBrown <neilb@xxxxxxx> · Sun, 3 Aug 2014 13:48:34 +1000

On Sat, 02 Aug 2014 18:21:07 +0200 Gionatan Danti <g.danti@xxxxxxxxxx> wrote:

> Hi again,
> I started a little experiment regarding BER/UREs and I wish to have an 
> informed feedback.
> 
> As I had a spare 500 GB Seagate Barracuda 7200.12 (BER 10^14 max: 
> http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.12/100529369e.pdf), 
> I started to read it continuously with the following shell command: dd 
> if=/dev/sdb of=/dev/null bs=8M iflag=direct
> 
> The drive was used as a member of a RAID10 set on one of my test 
> machines, so I assume its platters are full of pseudo-random data. At 
> 100 MB/s, I am now at about 15 TB read from it and I don't see any 
> problem reported by the kernel.
> 
> Some questions:
> 1) I should try in different / harder mode to generate UREs? Maybe using 
> some pre-determined pseudo-random string and then comparing the results 
> (I think this is more appropriate to catch silent data corruption, by 
> the way)?

You are very unlikely to see UREs just be reading the drive over and over a
again.  You easily do that for years and not get an error.  Or maybe you got
one just then.

> 2) how UREs should be visible? Via error reporting through dmesg?

If you want to see how the system responds when it hits a URE, you can use the
hdparm command and the "--make-bad-sector" option.  There is also a
"--repair-sector" option which will (hopefully) repair the sector when you
are done.

NeilBrown

> 
> Thanks.
> 
> Il 2014-07-31 09:16 Gionatan Danti ha scritto:
> >> Yes, you can usually get your data back with mdadm.
> >> 
> >> With latest code, a URE during recovery will cause a bad-block to be 
> >> recorded
> >> on the recovered device, and recovery will continue.  You end up with 
> >> a
> >> working array that has a few unreadable blocks on it.
> >> 
> >> NeilBrown
> > 
> > This is very good news :)
> > I case of parity RAID I assume the entire stripe is marked as bad, but
> > with mirror (eg: RAID10) only a single block (often 512B) is marked
> > bad on the recovered device, right?
> > 
> > From what mdadm/kernel version the new behavior is implemented? Maybe
> > the software RAID on my CentOS 6.5 is stronger then expected ;)
> > 
> > Regards.
> 

Attachment:
signature.asc

Description: PGP signature