On Mon, Jul 25, 2011 at 3:33 PM, Matthew Tice <mjtice@xxxxxxxxx> wrote: > On Mon, Jul 25, 2011 at 3:30 PM, Matthew Tice <mjtice@xxxxxxxxx> wrote: >> On Mon, Jul 25, 2011 at 3:21 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote: >>> On Mon Jul 25, 2011 at 03:04:34PM -0600, Matthew Tice wrote: >>> >>>> Well things are a lot different now - I'm unable to start the array >>>> successfully. I removed an older non-relevant drive that was giving >>>> me smart errors - when I rebooted the drive assignments shifted (not >>>> sure this really matters, though). >>>> >>>> Now when I try to start the array I get: >>>> >>>> # mdadm -A -f /dev/md0 >>>> mdadm: no devices found for /dev/md0 >>>> >>>> I can nudge it slightly with auto-detect: >>>> >>>> # mdadm --auto-detect >>>> >>>> Then I try to assemble the array with: >>>> >>>> # mdadm -A -f /dev/md0 /dev/sd[bcde] >>>> mdadm: cannot open device /dev/sde: Device or resource busy >>>> mdadm: /dev/sde has no superblock - assembly aborted >>>> >>> <- SNIP -> >>>> │ └─sde: [8:64] MD raid5 (none/4) 931.51g md_d0 inactive spare >>> <- SNIP -> >>>> >>>> I've looked but I'm unable to find where the drive is in use. >>> >>> lsdrv shows that it's in use in array md_d0 - presumably this is a >>> part-assembled array (possibly auto-assembled by the kernel). Try >>> stopping that first, then doing the "mdadm -A -f /dev/md0 /dev/sd[bcde]" >>> >> >> Nice catch, thanks, Robin. >> >> I stopped /dev/md_d0 then started the array on /dev/md0 >> >> # mdadm -A -f /dev/md0 /dev/sd[bcde] >> mdadm: /dev/md0 has been started with 3 drives (out of 4). >> >> It's only seeing the three drives. I did an fsck on it just in case >> but it failed: >> >> # fsck -n /dev/md0 >> fsck from util-linux-ng 2.17.2 >> e2fsck 1.41.12 (17-May-2010) >> Superblock has an invalid journal (inode 8). >> Clear? no >> >> fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0 >> >> Looks like /dev/sde is missing (as also noted above): >> >> # mdadm --detail /dev/md0 >> /dev/md0: >> Version : 00.90 >> Creation Time : Sat Mar 12 21:22:34 2011 >> Raid Level : raid5 >> Array Size : 2197723392 (2095.91 GiB 2250.47 GB) >> Used Dev Size : 732574464 (698.64 GiB 750.16 GB) >> Raid Devices : 4 >> Total Devices : 3 >> Preferred Minor : 0 >> Persistence : Superblock is persistent >> >> Update Time : Mon Jul 25 14:08:30 2011 >> State : clean, degraded >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> UUID : daf06d5a:b80528b1:2e29483d:f114274d (local to host storage) >> Events : 0.5593 >> >> Number Major Minor RaidDevice State >> 0 0 0 0 removed >> 1 8 48 1 active sync /dev/sdd >> 2 8 32 2 active sync /dev/sdc >> 3 8 16 3 active sync /dev/sdb >> > > One other strange thing I just noticed - /dev/sde keeps getting added > back into /dev/md_d0 (after I start the array on /dev/md0) > > # /usr/local/bin/lsdrv > **Warning** The following utility(ies) failed to execute: > pvs > lvs > Some information may be missing. > > PCI [ata_piix] 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 > Family) IDE Controller (rev 01) > ├─scsi 0:0:0:0 LITE-ON COMBO SOHC-4836K {2006061700044437} > │ └─sr0: [11:0] Empty/Unknown 1.00g > └─scsi 1:x:x:x [Empty] > PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation N10/ICH7 > Family SATA IDE Controller (rev 01) > ├─scsi 2:x:x:x [Empty] > └─scsi 3:0:0:0 ATA HDS728080PLA380 {PFDB20S4SNLT6J} > └─sda: [8:0] Partitioned (dos) 76.69g > ├─sda1: [8:1] (ext4) 75.23g {960433b3-af56-41bd-bb9a-d0a0fb5ffb45} > │ └─Mounted as > /dev/disk/by-uuid/960433b3-af56-41bd-bb9a-d0a0fb5ffb45 @ / > ├─sda2: [8:2] Partitioned (dos) 1.00k > └─sda5: [8:5] (swap) 1.46g {10c3b226-16d4-44ea-ad1e-6296bb92969d} > PCI [sata_sil24] 04:00.0 RAID bus controller: Silicon Image, Inc. SiI > 3132 Serial ATA Raid II Controller (rev 01) > ├─scsi 4:0:0:0 ATA WDC WD7500AADS-0 {WD-WCAV59574584} > │ └─sdb: [8:16] MD raid5 (3/4) 698.64g md0 clean in_sync > {daf06d5a-b805-28b1-2e29-483df114274d} > │ └─md0: [9:0] (ext3) 2.05t {a9a38e8e-d54d-407d-a786-31410ad6e17d} > ├─scsi 4:1:0:0 ATA WDC WD7500AADS-0 {WD-WCAV59459025} > │ └─sdc: [8:32] MD raid5 (2/4) 698.64g md0 clean in_sync > {daf06d5a-b805-28b1-2e29-483df114274d} > ├─scsi 4:2:0:0 ATA Hitachi HDS72101 {JP9911HZ1SKHNU} > │ └─sdd: [8:48] MD raid5 (1/4) 931.51g md0 clean in_sync > {daf06d5a-b805-28b1-2e29-483df114274d} > ├─scsi 4:3:0:0 ATA Hitachi HDS72101 {JP9960HZ1VK96U} > │ └─sde: [8:64] MD raid5 (none/4) 931.51g md_d0 inactive spare > {daf06d5a-b805-28b1-2e29-483df114274d} > │ └─md_d0: [254:0] Empty/Unknown 0.00k > └─scsi 7:x:x:x [Empty] > Here is something interesting from syslog: 1. I stop /dev/md_d0 Jul 25 15:38:56 localhost kernel: [ 4272.658244] md: md_d0 stopped. Jul 25 15:38:56 localhost kernel: [ 4272.658258] md: unbind<sde> Jul 25 15:38:56 localhost kernel: [ 4272.658271] md: export_rdev(sde) 2. I assemble /dev/md0 with: # mdadm -A /dev/md0 /dev/sd[bcde] mdadm: /dev/md0 has been started with 3 drives (out of 4). Jul 25 15:41:33 localhost kernel: [ 4429.537035] md: md0 stopped. Jul 25 15:41:33 localhost kernel: [ 4429.545447] md: bind<sde> Jul 25 15:41:33 localhost kernel: [ 4429.545644] md: bind<sdc> Jul 25 15:41:33 localhost kernel: [ 4429.545810] md: bind<sdb> Jul 25 15:41:33 localhost kernel: [ 4429.546827] md: bind<sdd> Jul 25 15:41:33 localhost kernel: [ 4429.546876] md: kicking non-fresh sde from array! Jul 25 15:41:33 localhost kernel: [ 4429.546883] md: unbind<sde> Jul 25 15:41:33 localhost kernel: [ 4429.546890] md: export_rdev(sde) Jul 25 15:41:33 localhost kernel: [ 4429.565035] md/raid:md0: device sdd operational as raid disk 1 Jul 25 15:41:33 localhost kernel: [ 4429.565041] md/raid:md0: device sdb operational as raid disk 3 Jul 25 15:41:33 localhost kernel: [ 4429.565045] md/raid:md0: device sdc operational as raid disk 2 Jul 25 15:41:33 localhost kernel: [ 4429.565631] md/raid:md0: allocated 4222kB Jul 25 15:41:33 localhost kernel: [ 4429.573438] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2 Jul 25 15:41:33 localhost kernel: [ 4429.574754] RAID conf printout: Jul 25 15:41:33 localhost kernel: [ 4429.574757] --- level:5 rd:4 wd:3 Jul 25 15:41:33 localhost kernel: [ 4429.574761] disk 1, o:1, dev:sdd Jul 25 15:41:33 localhost kernel: [ 4429.574765] disk 2, o:1, dev:sdc Jul 25 15:41:33 localhost kernel: [ 4429.574768] disk 3, o:1, dev:sdb Jul 25 15:41:33 localhost kernel: [ 4429.574863] md0: detected capacity change from 0 to 2250468753408 Jul 25 15:41:33 localhost kernel: [ 4429.575092] md0: unknown partition table Jul 25 15:41:33 localhost kernel: [ 4429.626140] md: bind<sde> So /dev/sde is "non-fresh" and has an unknown partition table. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html