RE: Requesting replace mode for changing a disk

"Guy Watkins" <linux-raid@xxxxxxxxxxxxxxxx> · Sat, 9 May 2009 22:20:07 -0400

} -----Original Message-----
} From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
} owner@xxxxxxxxxxxxxxx] On Behalf Of Bill Davidsen
} Sent: Saturday, May 09, 2009 7:08 PM
} To: Goswin von Brederlow
} Cc: linux-raid@xxxxxxxxxxxxxxx
} Subject: Re: Requesting replace mode for changing a disk
} 
} Goswin von Brederlow wrote:
} > Hi,
} >
} > consider the following situation: You have a software raid that runs
} > fine but one disk is suspect (e.g. SMART says failure imminent or
} > something). How do you replace that disk?
} >
} > Currently you have do fail/remove the disk from the raid, add a
} > fresh disk and resync. That leaves a large window in which redundancy
} > is compromised. With current disk sizes that can be days.
} >
} > It would be nice if one could tell the kernel to replace a disk in a
} > raid set with a spare without the need to degrade the raid.
} >
} > Thoughts?
} >
} 
} This is one of many things proposed occasionally here, no real
} objection, sometimes loud support, but no one actually *does* the code.
} 
} You have described the problem exactly, and the solution is still to do
} it manually. But you don't need to fail the drive long term, if you can
} stop the array for a few moments. You stop the array, remove the suspect
} drive, create a raid1 of the suspect drive marked write-mostly and the
} new spare, then add the raid1 in place of the suspect drive. For any
} chunks present on the new drive the reads will go there, reducing
} access, while data is copied from the old to the new in resync, and
} writes still go to the old suspect drive so if the new drive fails you
} are no worse off. When the raid1 is clean you stop the main array and
} back the suspect drive out.
} 
} This is complicated enough that I totally agree a hot migrate would be
} desirable. This is why people use lvm, although I make zero claims that
} this same problem will solve more easily, I'm just not an lvm guru (or
} even a newbie, just an occasional user).

If the disk is suspect, I would expect read errors!
If you have 1 bad block on the suspect disk, this process will fail.
If the logic was built-in to md, then any read errors while replacing could
be recovered from another disk or disks.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html