Re: self healing of MD raid

Robin Hill <robin@xxxxxxxxxxxxxxx> · Tue, 2 Jun 2015 20:14:06 +0100

On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote:
> On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> > On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@xxxxxxxxxx wrote:
> >
> >> Hi list
> >>
> >> I wonder if MD RAID software is kind of self healing.
> >> That is, if a read operation gets an IO error, then the logical
> >> sector of the RAID can be recreated from the other sector(s)
> >> of the raid, and then written out on the block which gave a read error.
> >>
> >> His could work both for the mirrored RAID types, and for the
> >> parity orientet RAID types.
> >>
> >> Is that implemented in MD RAID?
> >>
> >> Similarily the self healing process could be part of the monitoring
> >> background processes.
> >>
> >> Best regaqrds
> >> keld
> >
> > Yes, this is implemented as standard for all forms of RAID with
> > redundant data (parity/mirror). A read error will automatically trigger
> > a rewrite of the faulty block with data recovered from the other
> > members. This rewrite should also trigger a remapping within the drive
> > if the original block proves to be unwritable as well.
> >
> > Running a regular check (echo check > /sys/block/mdX/md/sync_action)
> > will do a full read of all active members in an array and therefore
> > trigger rewrites for any unreadable blocks. This is often set up as part
> > of the standard distro cron jobs, but should be set up manually if not.
> >
> 
> Do you know what would be the MD action if it cannot recover the
> faulty block from the other members ? Assuming not enough members are
> online, does it just print a warning in the dmesg ? Does any one in
> the MD layer keep track of the number of corruption events like this ?
> 
> --Alireza
> 

If the faulty block cannot be rebuilt from the other members then a read
error is passed on to the application and the array keeps running (the
same way a normal block device would handle a read error).

If you have a bad block log on the array member (a relatively new
feature) then it will record that the block is invalid. Otherwise I
don't think there's any tracking within the md layer - you'd need to
fall back on whatever tracking there is on the underlying block device
(i.e. SMART data, etc.).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Attachment:
signature.asc

Description: Digital signature