Re: Re-map disk sectors in userspace when rewriting after read errors

Robin Hill <robin@xxxxxxxxxxxxxxx> · Fri, 18 Sep 2009 12:15:11 +0100

On Fri Sep 18, 2009 at 01:52:14PM +0300, Majed B. wrote:

> Well, I think my case is different Matthias's and I can't reconstruct
> the data anymore, as you said, Robin.
> 
> So this leaves me with a degraded array with bad sectors and a dodgy
> filesystem.
> 
> You see, I can mount the LVM Logical Volume (formatted with XFS), but
> as soon as I hit some bad sectors, XFS complains and then one of the
> array disks jump out.
> Just now, one disk exited the array and renamed itself from sdg to sdj
> .... (this is the first time this happens). According to smartctl -a
> /dev/sdj, there are no bad sectors, but I still get this in
> /var/log/messages
> 
The renaming would suggest a hard bus reset - not what I'd expect with
just a bad block.

> Here's some info on smartctl -a /dev/sdg
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x002e   100   253   000    Old_age
> Always       -       0
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
> 
A lot of these are only updated via offline tests, so won't change in
normal use, even if there are issues.  Have you run any SMART tests on
the disk?  The long test usually shows a failure if the disk has read
errors.

> Plan B: Since I cloned the disk with bad sectors to another, what
> would happen if I zeroed the damaged one then cloned the clone to it?!
> 
Depends on what the actual condition of the disk is.  The zeroing should
remap any bad blocks though.

> I do realize that there will be zeros in the areas of bad sectors, but
> how will mdadm/md behave? Would a resync fail?
> 
mdadm doesn't care what data is on it, as long as the array metadata is
valid.  Providing all disks are readable (and the new disk is writable)
then a resync would certainly work - whether the filesystem will be
usable afterwards depends on how many zeroed blocks there are and where
they fall.

> I can run fsck at that point and files residing on bad sectors will be
> the only affected ones, correct?
> 
Files/directories yes - if the directory inodes get zeroed then all the
files within the directory will be affected (renamed & moved to
/lost+found).

I've had to do just this myself recently, and despite the low number of
zeroed blocks, there was an awful lot of filesystem damage (I ended up
restoring most of it from backup).

    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Attachment:
pgpMhFrcfly5v.pgp

Description: PGP signature