MD or MDADM bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is somewhat of a crosspost from my thread yesterday; but I think it deserves it's own thread atm. Some time ago, I had a device fail -- with the help of Neil, Tyler & others on the mailing list; a few patches to mdadm -- I was able to recover. Using mdadm --remove & mdadm --add, I was able to rebuild the bad disc in my array. Everything seemed fine; however -- when I rebooted and re-assembled the raid; it wouldn't take the disk that was re-added. I had to add it again; and let it rebuild. About 3 weeks ago, I lost power -- the outage lasted longer than the UPS, and my system shutdown. Upon startup, once again -- I had to re-add 'the disk' back to the array. For some reason, if I remove a device and add it back -- when I stop and re-assemble the array - it won't 'start' that disk.

Last night, I had a drive fail. With help from Michael & Forrest; I was able to attempt to rebuild the array by hot replacing the failed drive without rebooting to re-enable disk I/O to that position -- I only had one spare available -- it was suspect; and it turns out it was bad. During the rebuild, the disk started to have errors -- and the array puked:

Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sda, disabling device. Operation continuing on 26 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdb, disabling device. Operation continuing on 25 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdi, disabling device. Operation continuing on 18 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdj, disabling device. Operation continuing on 17 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdk, disabling device. Operation continuing on 16 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdl, disabling device. Operation continuing on 15 devices Aug 31 21:45:40 abyss kernel: raid5: Disk failure on sdn, disabling device. Operation continuing on 14 devices

All of this disks tested fine; this happened once before -- simply forcing the raid to re-assemble fixes the issue; then replace the bad disk and re-sync it.

The problem is; my array is now 26 of 28 disks -- /dev/sdm *IS* bad; it was removed and re-added but the new drive is faulty -- however, disk /dev/sdaa is not bad -- but, since it was the 'original' disk that was hot removed / added so long ago -- it doesn't assemble into the raid. I'm really stuck, I can't start the array -- and obviously I can't rebuild the two 'bad' disks. I asked this once before; and was told -- No, you shouldn't have to hotadd and resync each time, after hot-adding a "new" device and the initial rebuild finishes, unless there's another failure after that, or an unclean shutdown.

What can I do? I don't believe this is working as intended.

I'm using mdadm 2.0-devel-3 on a Linux 2.6.11.12 kernel, with version-1 superblocks.

-- David


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux