Re: [PATCH] proactive raid5 disk replacement for 2.6.11, updated

Lars Marowsky-Bree <lmb@xxxxxxx> · Thu, 18 Aug 2005 12:24:58 +0200

On 2005-08-18T15:28:41, Neil Brown <neilb@xxxxxxxxxxxxxxx> wrote:

> If we want to mirror a single drive in a raid5 array, I would really
> like to do that using the raid1 personality.
> e.g.
>    suspend io
>    remove the drive
>    build a raid1 (with no superblock) using the drive.
>    add that back into the array
>    resume io.

I hate to say this, but this is something where the Device Mapper
framework, with it's suspend/resume options and the ability to change
the mapping atomically.

Maybe copying some of the ideas would be useful.

Freeze, reconfigure one disk to be RAID1, resume - all IO goes on while
at the same time said RAID1 re-mirrors to the new disk. Repeat with a
removal later.

> To handle read failures, I would like the first step to be to re-write
> the failed block.  I believe most (all?) drives will relocate the
> block if a write cannot succeed at the normal location, so this will
> often fix the problem.  

Yes. This would be highly useful.

> A userspace process can then notice an unacceptable failure rate and
> start a miror/swap process as above.

Agreed. Combined with SMART monitoring, this could provide highly useful
features.

> This possible doesn't handle the possibility of a write failing very
> well, but I'm not sure what your approach does in that case.  Could
> you explain that?

I think a failed write can't really be handled - it might be retried
once or twice, but then the way to proceed is to kick the drive and
rebuild the array.

> It also means that if the raid1 rebuild hits a read-error it cannot
> cope whereas your code would just reconstruct the block from the rest
> of the raid5.

Good point. One way to fix this would be to have a callback to one level
up "Hi, I can't read this section, can you reconstruct and give it to
me?". (Which is a pretty ugly hack.)

However, that would also assume that the data on the disk which _can_ be
read still can be trusted. I'm not sure I'd buy that myself, untrusted.
But a periodic background consistency check for RAID might help convince
users that this is indeed the case ;-)

If you can no longer pro-actively reconstruct the disk because it has
indeed failed, maybe treating it like a failed disk and rebuilding the
array in the "classic" fashion isn't the worst idea, though.

Sincerely,
    Lars Marowsky-Brée <lmb@xxxxxxx>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html