Am Tue, 11 Aug 2009 10:56:02 +1000 (EST) schrieb NeilBrown: > If you look closely at the "mdadm -D" etc output that you included > you will see that md1 things that sdi2 is faulty. Maybe it is. > You would need to check kernel logs to be sure. I don't think the drive is bad. SMART values look ok, and md0 didn't have any problem with re-adding sdi1. I forgot another strange thing: While I could add sdi1 to md0 and the rebuild succeeded, I couldn't add sdi2 to md1 until after a reboot. I always got an error like this: mdadm: add new device failed for /dev/sdi2: Device or resource busy When all this happened, I was running 2.6.29.1. Afterwards, I tried upgrading to 2.6.30.4 to see if that solved the problem, but nothing changed. > Yes, bitmaps should prevent a full rebuild. I would need to see > kernel logs of when this rebuild happened and "mdadm -D" the > array to have any hope of guess why it didn't. > > NeilBrown $ mdadm -D /dev/md0 /dev/md0: Version : 1.01 Creation Time : Sat Mar 15 13:28:07 2008 Raid Level : raid5 Array Size : 1953535232 (1863.04 GiB 2000.42 GB) Used Dev Size : 488383808 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Aug 10 19:29:47 2009 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : quassel:0 (local to host quassel) UUID : 1111b4fd:4219035a:f52968e6:cc4dd971 Events : 650394 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 3 8 97 2 active sync /dev/sdg1 4 8 129 3 active sync /dev/sdi1 5 8 65 4 active sync /dev/sde1 --- kernel log --- 21:58:14 usb 4-5.2.4: USB disconnect, address 13 21:58:28 usb 4-5.2.4: new high speed USB device using ehci_hcd and address 17 21:58:28 usb 4-5.2.4: configuration #1 chosen from 1 choice 21:58:28 scsi10 : SCSI emulation for USB Mass Storage devices 21:58:28 usb-storage: device found at 17 21:58:28 usb-storage: waiting for device to settle before scanning 21:58:33 usb-storage: device scan complete 21:58:33 scsi 10:0:0:0: Direct-Access WDC WD10 EACS-00D6B0 PQ: 0 ANSI: 2 CCS 21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) 21:58:33 sd 10:0:0:0: [sdi] Write Protect is off 21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00 21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through 21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) 21:58:33 sd 10:0:0:0: [sdi] Write Protect is off 21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00 21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through 21:58:33 sdi: sdi1 sdi2 21:58:33 sd 10:0:0:0: [sdi] Attached SCSI disk 21:58:33 sd 10:0:0:0: Attached scsi generic sg9 type 0 I think here I unmounted the file system and stopped the LVM device on the array, but I'm not entirely sure. The initial 17 second delay suggests that this is the first time the array was accessed after unplugging the drive, since the drives were all spun down at the time. 22:03:57 md: md0 still in use. 22:03:57 md: md1 still in use. 22:03:57 md: md0 still in use. 22:03:57 md: md1 still in use. 22:04:14 end_request: I/O error, dev sdh, sector 2 22:04:14 md: super_written gets error=-5, uptodate=0 22:04:14 raid5: Disk failure on sdh1, disabling device. 22:04:14 raid5: Operation continuing on 4 devices. 22:04:14 RAID5 conf printout: 22:04:14 --- rd:5 wd:4 22:04:14 disk 0, o:1, dev:sdb1 22:04:14 disk 1, o:1, dev:sdd1 22:04:14 disk 2, o:1, dev:sdg1 22:04:14 disk 3, o:0, dev:sdh1 22:04:14 disk 4, o:1, dev:sde1 22:04:14 RAID5 conf printout: 22:04:14 --- rd:5 wd:4 22:04:14 disk 0, o:1, dev:sdb1 22:04:14 disk 1, o:1, dev:sdd1 22:04:14 disk 2, o:1, dev:sdg1 22:04:14 disk 4, o:1, dev:sde1 22:04:16 md: md0 still in use. 22:04:16 md: md1 still in use. 22:04:16 md: md0 still in use. 22:04:16 md: md1 still in use. 22:04:21 raid5: Disk failure on sdh2, disabling device. 22:04:21 raid5: Operation continuing on 1 devices. 22:04:21 RAID5 conf printout: 22:04:21 --- rd:2 wd:1 22:04:21 disk 0, o:0, dev:sdh2 22:04:21 disk 1, o:1, dev:sde2 22:04:21 RAID5 conf printout: 22:04:21 --- rd:2 wd:1 22:04:21 disk 1, o:1, dev:sde2 /etc/init.d/mdadm-raid stop This is mdadm 2.6.8 from Debian lenny. That segfault probably shouldn't have happened... 22:04:32 md: md0 stopped. 22:04:32 md: unbind<sdb1> 22:04:32 md: export_rdev(sdb1) 22:04:32 md: unbind<sde1> 22:04:32 md: export_rdev(sde1) 22:04:32 md: unbind<sdh1> 22:04:32 md: export_rdev(sdh1) 22:04:32 md: unbind<sdg1> 22:04:32 md: export_rdev(sdg1) 22:04:32 md: unbind<sdd1> 22:04:32 md: export_rdev(sdd1) 22:04:32 mdadm[18096]: segfault at 118 ip 0806a7b9 sp bffb8160 error 4 in mdadm[8048000+2a000] /etc/init.d/mdadm-raid start 22:04:37 md: md0 stopped. 22:04:38 md: bind<sdd1> 22:04:38 md: bind<sdg1> 22:04:38 md: bind<sdi1> 22:04:38 md: bind<sde1> 22:04:38 md: bind<sdb1> 22:04:38 md: kicking non-fresh sdi1 from array! 22:04:38 md: unbind<sdi1> 22:04:38 md: export_rdev(sdi1) 22:04:38 raid5: device sdb1 operational as raid disk 0 22:04:38 raid5: device sde1 operational as raid disk 4 22:04:38 raid5: device sdg1 operational as raid disk 2 22:04:38 raid5: device sdd1 operational as raid disk 1 22:04:38 raid5: allocated 5255kB for md0 22:04:38 raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2 22:04:38 RAID5 conf printout: 22:04:38 --- rd:5 wd:4 22:04:38 disk 0, o:1, dev:sdb1 22:04:38 disk 1, o:1, dev:sdd1 22:04:38 disk 2, o:1, dev:sdg1 22:04:38 disk 4, o:1, dev:sde1 22:04:38 md0: bitmap initialized from disk: read 1/1 pages, set 1 bits 22:04:38 created bitmap (8 pages) for device md0 22:04:38 md0: detected capacity change from 0 to 2000420077568 22:04:38 md0: unknown partition table mdadm /dev/md0 -a /dev/sdi1 22:05:21 md: bind<sdi1> 22:05:21 RAID5 conf printout: 22:05:21 --- rd:5 wd:4 22:05:21 disk 0, o:1, dev:sdb1 22:05:21 disk 1, o:1, dev:sdd1 22:05:21 disk 2, o:1, dev:sdg1 22:05:21 disk 3, o:1, dev:sdi1 22:05:21 disk 4, o:1, dev:sde1 22:05:21 md: recovery of RAID array md0 22:05:21 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. 22:05:21 md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. 22:05:21 md: using 128k window, over a total of 488383808 blocks. This is probably where I tried to add sdi2 to md1 without any luck. 22:05:54 md: export_rdev(sdi2) 22:05:55 md: export_rdev(sdi2) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html