Re: Removed disk vs. failed disk?

NeilBrown <neilb@xxxxxxx> · Thu, 7 May 2015 13:07:07 +1000

On Tue, 05 May 2015 10:42:29 -0600 Hans Malissa <hmalissa@xxxxxx> wrote:

> I’m somewhat new to using RAID, but I may be looking at a broken hard drive already. I’ve been looking at the documentation of mdadm and Linux software RAID, but I’m not sure if I understand everything correctly. So I apologize if some of the questions have already been answered elsewhere, but I need to get this thing running again as soon as possible.
> I cannot mount the RAID (/dev/md0) anymore. /proc/mdstat looks like this:
> 
> # cat /proc/mdstat
> Personalities : [raid1] 
> md0 : active raid1 sdb1[0]
>       976760640 blocks super 1.0 [2/1] [U_]
>       bitmap: 3/8 pages [12KB], 65536KB chunk
> 
> unused devices: <none>

This state shouldn't stop the filesystem on the array from being mounted.
It is a RAID1 with one missing device, so all the data is still present on
the other device.

What did you expect to be on /dev/md0?
What does "fsck /dev/md0" show?
What does "mount /dev/md0 /mnt" show ??

> 
> and mdadm —detail looks like this:
> 
> # mdadm —detail /dev/md0
> /dev/md0:
>         Version : 1.0
>   Creation Time : Sun Dec 15 16:03:28 2013
>      Raid Level : raid1
>      Array Size : 976760640 (931.51 GiB 1000.20 GB)
>   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
>    Raid Devices : 2
>   Total Devices : 1
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Tue May  5 10:17:03 2015
>           State : active, degraded 
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 0
>   Spare Devices : 0
> 
>            Name : eprb21:0  (local to host eprb21)
>            UUID : 34d12cbd:eef71d8d:14dcf224:dfe6c013
>          Events : 3509
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       17        0      active sync   /dev/sdb1
>        1       0        0        1      removed
> 
> So it looks like there is a problem with the second disc in the array has some problem. First, I’m not sure what the difference between ‘removed’ and ‘failed’ is, because the disk is physically still present. How does mdadm differentiate between both states?

A "failed" device is not much different from a "removed" device.

When md has a working device and discovers that it isn't working any more,
typically because a 'write' failed, it marks it as 'FAULTY'.

You can then remove it from the array with "mdadm --remove ...."
Then it will be "removed".

When you assemble an array, e.g. at boot time, and device that has previously
failed will not be allowed into the array.  So they won't appear as 'faulty'
in the description of the array, they will just be "removed".

> I understand that the next step would be to put in a new hard drive and rebuild the array. Is there a way to figure out right away if the data on the intact disk is uncompromised?

Certainly you should check the data you have is good before adding the other
device back into the array.  'fsck' and 'mount' are what I would suggest, but
it depends on what you think is stored on the array.

NeilBrown

> Best regards, and thanks for your help,
> 
> Hans Malissa--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment:
pgpu55Zm7Usjf.pgp

Description: OpenPGP digital signature