Re: Some md/mdadm bugs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 02 Feb 2012 20:08:53 +0100 Asdo <asdo@xxxxxxxxxxxxx> wrote:

> Hello list
> 
> I removed sda from the system and I confirmed /dev/sda did not exist any 
> more.
> After some time an I/O was issued to the array and sda6 was failed by MD 
> in /dev/md5:
> 
> md5 : active raid1 sdb6[2] sda6[0](F)
>        10485688 blocks super 1.0 [2/1] [_U]
>        bitmap: 1/160 pages [4KB], 32KB chunk
> 
> At this point I tried:
> 
> mdadm /dev/md5 --remove detached
> --> no effect !
> mdadm /dev/md5 --remove failed
> --> no effect !

What version of mdadm? (mdadm --version).
These stopped working at one stage and were fixed in 3.1.5.


> mdadm /dev/md5 --remove /dev/sda6
> --> mdadm: cannot find /dev/sda6: No such file or directory  (!!!)
> mdadm /dev/md5 --remove sda6
> --> finally worked ! (I don't know how I had the idea to actually try 
> this...)

Well done.

> 
> 
> Then here is another array:
> 
> md1 : active raid1 sda2[0] sdb2[2]
>        10485688 blocks super 1.0 [2/2] [UU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> This one did not even realize that sda was removed from the system long ago.

Nobody told it.

> Apparently only when an I/O is issued, mdadm realizes the drive is not 
> there anymore.

Only when there is IO, or someone tells it.

> I am wondering (and this would be very serious) what happens if a new 
> drives is inserted and it takes the /dev/sda identifier!? Would MD start 
> writing or do any operation THERE!?

Wouldn't happen.  As long as md hold onto the shell of the old sda nothing
else will get the name 'sda'.

> 
> There is another problem...
> I tried to make MD realize that the drive is detached:
> 
> mdadm /dev/md1 --fail detached
> --> no effect !
> however:
> ls /dev/sda2
> --> ls: cannot access /dev/sda2: No such file or directory
> so "detached" also seems broken...

Before 3.1.5 it was.  If you are using a newer mdadm I'll need to look into
it.

> 
> 
> 
> And here goes also a feature request:
> 
> if a device is detached from the system, (echo 1 > device/delete or 
> removing via hardware hot-swap + AHCI) MD should detect this situation 
> and mark the device (and all its partitions) as failed in all arrays, or 
> even remove the device completely from the RAID.

This needs to be done via a udev rule.
That is why --remove understands names like "sda6" (no /dev).

Then a device is removed, udev processes the remove notification.
The rule

ACTION=="remove", RUN+="/sbin/mdadm -If $name"

in /etc/udev/rules.d/something.rules

will make that happen.

> In my case I have verified that MD did not realize the device was 
> removed from the system, and only much later when an I/O was issued to 
> the disk, it would mark the device as failed in the RAID.
> 
> After the above is implemented, it could be an idea to actually allow a 
> new disk to take the place of a failed disk automatically if that would 
> be a "re-add" (probably the same failed disk is being reinserted by the 
> operator) and this even if the array is running, and especially if there 
> is a bitmap.

It should so that, providing you have a udev rule like:
ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"

You can even get it to add other devices as spares with e.g.
  policy action=force-spare

though you almost certainly don't want that general a policy.  You would
want to restrict that to certain ports (device paths).


> Now it doesn't happen:
> When I reinserted the disk, udev triggered the --incremental, to 
> reinsert the device, but mdadm refused to do anything because the old 
> slot was still occupied with a failed+detached device. I manually 
> removed the device from the raid then I ran --incremental, but mdadm 
> still refused to re-add the device to the RAID because the array was 
> running. I think that if it is a re-add, and especially if the bitmap is 
> active, I can't think of a situation in which the user would *not* want 
> to do an incremental re-add even if the array is running.

Hmmm.. that doesn't seem right.  What version of mdadm are you running?
Maybe a newer one would get this right.

Thanks for the reports.

NeilBrown


> 
> Thank you
> Asdo
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux