Re: Server down-failed RAID5-asking for some assistance

NeilBrown <neilb@xxxxxxx> · Fri, 22 Apr 2011 12:57:34 +1000

On Thu, 21 Apr 2011 20:32:57 -0600 John Valarti <mdadmuser@xxxxxxxxx> wrote:

> On Thu, Apr 21, 2011 at 1:59 PM, David Brown <david.brown@xxxxxxxxxxxx> wrote:
> .
> > My first thought would be to get /all/ the disks, not just the "failed"
> > ones, out of the machine.  You want to make full images of them (with
> > ddrescue or something similar) to files on another disk, and then work with
> > those images.  ..
> > Once you've got some (hopefully most) of your data recovered from the
> > images, buy four /new/ disks to put in the machine, and work on your
> > restore.  You don't want to reuse the failing disks, and probably the other
> > two equally old and worn disks will be high risk too.
> 
> OK, I think I understand.
> Does that mean I need to buy 8 disks, all the same size or bigger?
> The originals are 250GB SATA so that should be OK, I guess.
> 
> I read some more and found out I should run mdadm --examine.
> 
> Should I not be able to just add the one disk partition sdc2 back to the RAID?

Possibly.

It looks like sdb2 failed in October 2009 !!!! and nobody noticed.  So your
array has been running degraded since then.

If you

 mdadm -A /dev/md1 --force /dev/sd[acd]2

Then you will have your array back, though there could be a small amount of
data corruption if the array was in the middle of writing when the system
crashed/died/lost-power/whatever-happened.

This will give you access to your data.
How much you trust your drives to continue to give access to your data is up
to you.  But you would be wise to at least by a 1TB drive to copy all the
data on to before you put too much stress on your old drives.

Once you have a safe copy, you could

 mdadm /dev/md1 --add /dev/sdb2

This will add sdb2 to the array and it will recovery the data for sdb2 from
the data and parity on the other drives.  If this works - great.  However
there is a reasonable chance you will hit a read error in which case the
recovery will abort and you will still have your data on the degraded array.

You could possibly run some bad-blocks test on each drive (which will be
destructive - but you  have a backup on the 1TB drive) and decide if you want
to throw them out or keep using them.

What ever you do, once you have a work array again what you feel happy to
trust, make sure a 'check' run happens regularly.  Some distros provide a
cron job to do this for you.  It involves simply
   echo check > /sys/block/md0/md/sync_action

This will read every block on every device to make sure there are no sleeping
bad blocks.  Every month is probably a reasonable frequency to run it.

Also run "mdadm --monitor" configured to send you email if there is a drive
failure.  Also run "mdadm --monitor --oneshot" from a cron tab every day so
that if you have a degraded array it will nag you about it every day.

Good luck,
NeilBrown

> 
> 
> Here is the result of --examine
> 
> /dev/sda2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:54 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce60 - correct
>         Events : 28580020
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     1       8       18        1      active sync   /dev/sdb2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> /dev/sdb2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 4
> Preferred Minor : 1
> 
>    Update Time : Sun Oct 18 10:04:06 2009
>          State : active
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
>  Spare Devices : 0
>       Checksum : 5171dcb2 - correct
>         Events : 20333614
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     3       8       50        3      active sync   /dev/sdd2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       8       50        3      active sync   /dev/sdd2
> /dev/sdc2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:51 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce6b - correct
>         Events : 28580018
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     2       8       34        2      active sync   /dev/sdc2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> /dev/sdd2:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : ddf4d448:36afa319:f0917855:03f8bbe8
>  Creation Time : Mon May 15 16:38:05 2006
>     Raid Level : raid5
>  Used Dev Size : 244975104 (233.63 GiB 250.85 GB)
>     Array Size : 734925312 (700.88 GiB 752.56 GB)
>   Raid Devices : 4
>  Total Devices : 3
> Preferred Minor : 1
> 
>    Update Time : Mon Apr 18 07:48:54 2011
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5674ce4e - correct
>         Events : 28580020
> 
>         Layout : left-symmetric
>     Chunk Size : 256K
> 
>      Number   Major   Minor   RaidDevice State
> this     0       8        2        0      active sync   /dev/sda2
> 
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       8       18        1      active sync   /dev/sdb2
>   2     2       8       34        2      active sync   /dev/sdc2
>   3     3       0        0        3      faulty removed
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Server down-fail​ed RAID5-asking for some assistance

Re: Server down-failed RAID5-asking for some assistance