On Mon, Jul 25, 2011 at 3:42 PM, Matthew Tice <mjtice@xxxxxxxxx> wrote: > On Mon, Jul 25, 2011 at 3:33 PM, Matthew Tice <mjtice@xxxxxxxxx> wrote: >> On Mon, Jul 25, 2011 at 3:30 PM, Matthew Tice <mjtice@xxxxxxxxx> wrote: >>> On Mon, Jul 25, 2011 at 3:21 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote: >>>> On Mon Jul 25, 2011 at 03:04:34PM -0600, Matthew Tice wrote: >>>> >>>>> Well things are a lot different now - I'm unable to start the array >>>>> successfully. I removed an older non-relevant drive that was giving >>>>> me smart errors - when I rebooted the drive assignments shifted (not >>>>> sure this really matters, though). >>>>> >>>>> Now when I try to start the array I get: >>>>> >>>>> # mdadm -A -f /dev/md0 >>>>> mdadm: no devices found for /dev/md0 >>>>> >>>>> I can nudge it slightly with auto-detect: >>>>> >>>>> # mdadm --auto-detect >>>>> >>>>> Then I try to assemble the array with: >>>>> >>>>> # mdadm -A -f /dev/md0 /dev/sd[bcde] >>>>> mdadm: cannot open device /dev/sde: Device or resource busy >>>>> mdadm: /dev/sde has no superblock - assembly aborted >>>>> >>>> <- SNIP -> >>>>> │ └─sde: [8:64] MD raid5 (none/4) 931.51g md_d0 inactive spare >>>> <- SNIP -> >>>>> >>>>> I've looked but I'm unable to find where the drive is in use. >>>> >>>> lsdrv shows that it's in use in array md_d0 - presumably this is a >>>> part-assembled array (possibly auto-assembled by the kernel). Try >>>> stopping that first, then doing the "mdadm -A -f /dev/md0 /dev/sd[bcde]" >>>> >>> >>> Nice catch, thanks, Robin. >>> >>> I stopped /dev/md_d0 then started the array on /dev/md0 >>> >>> # mdadm -A -f /dev/md0 /dev/sd[bcde] >>> mdadm: /dev/md0 has been started with 3 drives (out of 4). >>> >>> It's only seeing the three drives. I did an fsck on it just in case >>> but it failed: >>> >>> # fsck -n /dev/md0 >>> fsck from util-linux-ng 2.17.2 >>> e2fsck 1.41.12 (17-May-2010) >>> Superblock has an invalid journal (inode 8). >>> Clear? no >>> >>> fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0 >>> >>> Looks like /dev/sde is missing (as also noted above): >>> >>> # mdadm --detail /dev/md0 >>> /dev/md0: >>> Version : 00.90 >>> Creation Time : Sat Mar 12 21:22:34 2011 >>> Raid Level : raid5 >>> Array Size : 2197723392 (2095.91 GiB 2250.47 GB) >>> Used Dev Size : 732574464 (698.64 GiB 750.16 GB) >>> Raid Devices : 4 >>> Total Devices : 3 >>> Preferred Minor : 0 >>> Persistence : Superblock is persistent >>> >>> Update Time : Mon Jul 25 14:08:30 2011 >>> State : clean, degraded >>> Active Devices : 3 >>> Working Devices : 3 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : left-symmetric >>> Chunk Size : 64K >>> >>> UUID : daf06d5a:b80528b1:2e29483d:f114274d (local to host storage) >>> Events : 0.5593 >>> >>> Number Major Minor RaidDevice State >>> 0 0 0 0 removed >>> 1 8 48 1 active sync /dev/sdd >>> 2 8 32 2 active sync /dev/sdc >>> 3 8 16 3 active sync /dev/sdb >>> >> >> One other strange thing I just noticed - /dev/sde keeps getting added >> back into /dev/md_d0 (after I start the array on /dev/md0) >> >> # /usr/local/bin/lsdrv >> **Warning** The following utility(ies) failed to execute: >> pvs >> lvs >> Some information may be missing. >> >> PCI [ata_piix] 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 >> Family) IDE Controller (rev 01) >> ├─scsi 0:0:0:0 LITE-ON COMBO SOHC-4836K {2006061700044437} >> │ └─sr0: [11:0] Empty/Unknown 1.00g >> └─scsi 1:x:x:x [Empty] >> PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation N10/ICH7 >> Family SATA IDE Controller (rev 01) >> ├─scsi 2:x:x:x [Empty] >> └─scsi 3:0:0:0 ATA HDS728080PLA380 {PFDB20S4SNLT6J} >> └─sda: [8:0] Partitioned (dos) 76.69g >> ├─sda1: [8:1] (ext4) 75.23g {960433b3-af56-41bd-bb9a-d0a0fb5ffb45} >> │ └─Mounted as >> /dev/disk/by-uuid/960433b3-af56-41bd-bb9a-d0a0fb5ffb45 @ / >> ├─sda2: [8:2] Partitioned (dos) 1.00k >> └─sda5: [8:5] (swap) 1.46g {10c3b226-16d4-44ea-ad1e-6296bb92969d} >> PCI [sata_sil24] 04:00.0 RAID bus controller: Silicon Image, Inc. SiI >> 3132 Serial ATA Raid II Controller (rev 01) >> ├─scsi 4:0:0:0 ATA WDC WD7500AADS-0 {WD-WCAV59574584} >> │ └─sdb: [8:16] MD raid5 (3/4) 698.64g md0 clean in_sync >> {daf06d5a-b805-28b1-2e29-483df114274d} >> │ └─md0: [9:0] (ext3) 2.05t {a9a38e8e-d54d-407d-a786-31410ad6e17d} >> ├─scsi 4:1:0:0 ATA WDC WD7500AADS-0 {WD-WCAV59459025} >> │ └─sdc: [8:32] MD raid5 (2/4) 698.64g md0 clean in_sync >> {daf06d5a-b805-28b1-2e29-483df114274d} >> ├─scsi 4:2:0:0 ATA Hitachi HDS72101 {JP9911HZ1SKHNU} >> │ └─sdd: [8:48] MD raid5 (1/4) 931.51g md0 clean in_sync >> {daf06d5a-b805-28b1-2e29-483df114274d} >> ├─scsi 4:3:0:0 ATA Hitachi HDS72101 {JP9960HZ1VK96U} >> │ └─sde: [8:64] MD raid5 (none/4) 931.51g md_d0 inactive spare >> {daf06d5a-b805-28b1-2e29-483df114274d} >> │ └─md_d0: [254:0] Empty/Unknown 0.00k >> └─scsi 7:x:x:x [Empty] >> > > Here is something interesting from syslog: > > 1. I stop /dev/md_d0 > Jul 25 15:38:56 localhost kernel: [ 4272.658244] md: md_d0 stopped. > Jul 25 15:38:56 localhost kernel: [ 4272.658258] md: unbind<sde> > Jul 25 15:38:56 localhost kernel: [ 4272.658271] md: export_rdev(sde) > > 2. I assemble /dev/md0 with: > # mdadm -A /dev/md0 /dev/sd[bcde] > mdadm: /dev/md0 has been started with 3 drives (out of 4). > > Jul 25 15:41:33 localhost kernel: [ 4429.537035] md: md0 stopped. > Jul 25 15:41:33 localhost kernel: [ 4429.545447] md: bind<sde> > Jul 25 15:41:33 localhost kernel: [ 4429.545644] md: bind<sdc> > Jul 25 15:41:33 localhost kernel: [ 4429.545810] md: bind<sdb> > Jul 25 15:41:33 localhost kernel: [ 4429.546827] md: bind<sdd> > Jul 25 15:41:33 localhost kernel: [ 4429.546876] md: kicking non-fresh > sde from array! > Jul 25 15:41:33 localhost kernel: [ 4429.546883] md: unbind<sde> > Jul 25 15:41:33 localhost kernel: [ 4429.546890] md: export_rdev(sde) > Jul 25 15:41:33 localhost kernel: [ 4429.565035] md/raid:md0: device > sdd operational as raid disk 1 > Jul 25 15:41:33 localhost kernel: [ 4429.565041] md/raid:md0: device > sdb operational as raid disk 3 > Jul 25 15:41:33 localhost kernel: [ 4429.565045] md/raid:md0: device > sdc operational as raid disk 2 > Jul 25 15:41:33 localhost kernel: [ 4429.565631] md/raid:md0: allocated 4222kB > Jul 25 15:41:33 localhost kernel: [ 4429.573438] md/raid:md0: raid > level 5 active with 3 out of 4 devices, algorithm 2 > Jul 25 15:41:33 localhost kernel: [ 4429.574754] RAID conf printout: > Jul 25 15:41:33 localhost kernel: [ 4429.574757] --- level:5 rd:4 wd:3 > Jul 25 15:41:33 localhost kernel: [ 4429.574761] disk 1, o:1, dev:sdd > Jul 25 15:41:33 localhost kernel: [ 4429.574765] disk 2, o:1, dev:sdc > Jul 25 15:41:33 localhost kernel: [ 4429.574768] disk 3, o:1, dev:sdb > Jul 25 15:41:33 localhost kernel: [ 4429.574863] md0: detected > capacity change from 0 to 2250468753408 > Jul 25 15:41:33 localhost kernel: [ 4429.575092] md0: unknown partition table > Jul 25 15:41:33 localhost kernel: [ 4429.626140] md: bind<sde> > > So /dev/sde is "non-fresh" and has an unknown partition table. > Okay, I was able to add it back in by stopping this /dev/md_d0 and then: # mdadm /dev/md0 --add /dev/sde mdadm: re-added /dev/sde So now it's syncing: # mdadm --detail /dev/md0 /dev/md0: Version : 00.90 Creation Time : Sat Mar 12 21:22:34 2011 Raid Level : raid5 Array Size : 2197723392 (2095.91 GiB 2250.47 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jul 25 15:52:29 2011 State : clean, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 0% complete UUID : daf06d5a:b80528b1:2e29483d:f114274d (local to host storage) Events : 0.5599 Number Major Minor RaidDevice State 4 8 64 0 spare rebuilding /dev/sde 1 8 48 1 active sync /dev/sdd 2 8 32 2 active sync /dev/sdc 3 8 16 3 active sync /dev/sdb # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sde[4] sdd[1] sdb[3] sdc[2] 2197723392 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU] [>....................] recovery = 0.4% (3470464/732574464) finish=365.0min speed=33284K/sec unused devices: <none> However, it's still failing an fsck - so does order matter when I re-assemble the array? I see conflicting answers online. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html