Re: self healing of MD raid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote:
> On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> > On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@xxxxxxxxxx wrote:
> >
> >> Hi list
> >>
> >> I wonder if MD RAID software is kind of self healing.
> >> That is, if a read operation gets an IO error, then the logical
> >> sector of the RAID can be recreated from the other sector(s)
> >> of the raid, and then written out on the block which gave a read error.
> >>
> >> His could work both for the mirrored RAID types, and for the
> >> parity orientet RAID types.
> >>
> >> Is that implemented in MD RAID?
> >>
> >> Similarily the self healing process could be part of the monitoring
> >> background processes.
> >>
> >> Best regaqrds
> >> keld
> >
> > Yes, this is implemented as standard for all forms of RAID with
> > redundant data (parity/mirror). A read error will automatically trigger
> > a rewrite of the faulty block with data recovered from the other
> > members. This rewrite should also trigger a remapping within the drive
> > if the original block proves to be unwritable as well.
> >
> > Running a regular check (echo check > /sys/block/mdX/md/sync_action)
> > will do a full read of all active members in an array and therefore
> > trigger rewrites for any unreadable blocks. This is often set up as part
> > of the standard distro cron jobs, but should be set up manually if not.
> >
> 
> Do you know what would be the MD action if it cannot recover the
> faulty block from the other members ? Assuming not enough members are
> online, does it just print a warning in the dmesg ? Does any one in
> the MD layer keep track of the number of corruption events like this ?
> 
> --Alireza
> 

If the faulty block cannot be rebuilt from the other members then a read
error is passed on to the application and the array keeps running (the
same way a normal block device would handle a read error).

If you have a bad block log on the array member (a relatively new
feature) then it will record that the block is invalid. Otherwise I
don't think there's any tracking within the md layer - you'd need to
fall back on whatever tracking there is on the underlying block device
(i.e. SMART data, etc.).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux