Re: Failed during rebuild (raid5)

Phil Turmel <philip@xxxxxxxxxx> · Fri, 03 May 2013 10:51:53 -0400

On 05/03/2013 09:52 AM, John Stoffel wrote:
> 
> After watching endless threads about RAID5 arrays losing a disk, and
> then losing a second during the rebuild, I wonder if it would make
> sense to:
> 
> - have MD automatically increase all disk timeouts when doing a
>   rebuild.  The idea being that we are more tolerant of a bad sector
>   when rebuilding?  The idea would be to NOT just evict disks when in
>   potentially bad situations without trying really hard.  

This would be conterproductive for those users who actually follow
manufacturer guidelines when selecting drives for their arrays.

Anyways, it's a policy issue that belongs in userspace.  Distros can do
this today if they want.  There's no lack of scripts in this list's
archives.

> - Automatically setup an automatic scrub of the array that happens
>   weekly unless you explicitly turn it off.  This would possibly
>   require changes from the distros, but if it could be made a core
>   part of MD so that all the blocks in the array get read each week,
>   that would help with silent failures.

I understand some distros already do this.

> We've got all these compute cycles kicking around that could be used
> to make things even more reliable, we should be using them in some
> smart way.

But the "smart way" varies with the hardware at hand.  There's no "one
size fits all" solution here.

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html