Problem with 5disk RAID5 array - two drives lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good day,

I'm running FC4 kernel 2.6.11-1.1369 with a 5 disk RAID5 array. This past weekend after a reboot to my machine, /dev/md0 will no longer mount and Fedora will abort booting the system and force me to fix the filesystem. Upon further investigation, it looks like I lost two drives within a few weeks of each other. I'll go ahead and get this out of the way - I'm an idiot and didn't setup mdadm -F for mailing with RAID problems.

It appears that /dev/hdf1 failed this past week and /dev/hdh1 failed back in February. I tried a mdadm --assemble --force and was able to get the following:

==========================
mdadm: forcing event count in /dev/hdf1(1) from 777532 upto 777535
mdadm: clearing FAULTY flag for device 2 in /dev/md0 for /dev/hdf1
raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2
mdadm: /dev/md0 has been started with 4 drives (out of 5).
==========================


I then tried to mount /dev/md0 and received the following:
====================
raid5: Disk failure on hdf1, disabling device. Operation continuing on drives
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or other error
In some cases useful info is found in syslog - try dmesg | tail
=====================

In checking dmesg, I find:
==================================
raid5: device hde1 operational as raid disk 0
raid5: device hdc1 operational as raid disk 4
raid5: device hdg1 operational as raid disk 2
raid5: device hdf1 operational as raid disk 1
raid5: allocated 5254kB for md0
raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2
RAID5 conf printout:
--- rd:5 wd:4 fd:1
disk 0, o:1, dev:hde1
disk 1, o:1, dev:hdf1
disk 2, o:1, dev:hdg1
disk 4, o:1, dev:hdc1
usb 1-2: USB disconnect, address 2
usb 1-2: new full speed USB device using uhci_hcd and address 3
usb 1-2: not running at top speed; connect to a high speed hub
scsi1 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning
  Vendor: SanDisk   Model: Cruzer Mini       Rev: 0.1
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 1000944 512-byte hdwr sectors (512 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: assuming drive cache: write through
SCSI device sda: 1000944 512-byte hdwr sectors (512 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: assuming drive cache: write through
sda: sda1
Attached scsi removable disk sda at scsi1, channel 0, id 0, lun 0
usb-storage: device scan complete
spurious 8259A interrupt: IRQ7.
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6720, high=0, low=6720, sector=6719
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6719
raid5: Disk failure on hdf1, disabling device. Operation continuing on 3 devices
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6731, high=0, low=6731, sector=6727
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6727
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6735, high=0, low=6735, sector=6735
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6735
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6743, high=0, low=6743, sector=6743
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6743
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6753, high=0, low=6753, sector=6751
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6751
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6763, high=0, low=6763, sector=6759
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6759
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6770, high=0, low=6770, sector=6767
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6767
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6776, high=0, low=6776, sector=6775
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6775
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6803, high=0, low=6803, sector=6783
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6783
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6803, high=0, low=6803, sector=6791
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6791
JBD: Failed to read block at offset 1794
JBD: recovery failed
EXT3-fs: error loading journal.
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6803, high=0, low=6803, sector=6799
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6799
Buffer I/O error on device md0, logical block 1604
lost page write due to I/O error on md0
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6807, high=0, low=6807, sector=6807
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6807
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6815, high=0, low=6815, sector=6815
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6815
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6823, high=0, low=6823, sector=6823
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6823
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6831, high=0, low=6831, sector=6831
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6831
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6841, high=0, low=6841, sector=6839
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6839
hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdf: dma_intr: error=0x40 { UncorrectableError }, LBAsect=6851, high=0, low=6851, sector=6847
ide: failed opcode was: unknown
end_request: I/O error, dev hdf, sector 6847
RAID5 conf printout:
--- rd:5 wd:3 fd:2
disk 0, o:1, dev:hde1
disk 1, o:0, dev:hdf1
disk 2, o:1, dev:hdg1
disk 4, o:1, dev:hdc1
RAID5 conf printout:
--- rd:5 wd:3 fd:2
disk 0, o:1, dev:hde1
disk 2, o:1, dev:hdg1
disk 4, o:1, dev:hdc1
================================

I'm guessing /dev/hdf is shot. I haven't tried an fsck though. Would this be advisable? I don't want to bork all the data. It's about 700 GB of data. I'm open to losing any data that was added since the February drive failure. Is there a way that I can try and build the array again with /dev/hdh instead of /dev/hdf with some possible data corruption on files that were added since Feb?

Any advice would great. I'm at a loss and I don't want to lose all of the data if I don't have to. I might end up visiting one of those data recovery shops if I can't fix this on my own.

Thank you,

Tim



mdadm -E outputs below:
=================================

/dev/hdc1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Apr 16 09:10:28 2006
          State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
  Spare Devices : 0
       Checksum : 4a150769 - correct
         Events : 0.777535

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     4      22        1        4      active sync   /dev/hdc1

   0     0      33        1        0      active sync   /dev/hde1
   1     1       0        0        1      faulty removed
   2     2      34        1        2      active sync   /dev/hdg1
   3     3       0        0        3      faulty removed
   4     4      22        1        4      active sync   /dev/hdc1



/dev/hde1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Apr 16 09:10:28 2006
          State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
  Spare Devices : 0
       Checksum : 4a15076c - correct
         Events : 0.777535

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     0      33        1        0      active sync   /dev/hde1

   0     0      33        1        0      active sync   /dev/hde1
   1     1       0        0        1      faulty removed
   2     2      34        1        2      active sync   /dev/hdg1
   3     3       0        0        3      faulty removed
   4     4      22        1        4      active sync   /dev/hdc1




/dev/hdf1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 14 13:46:06 2006
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 2
  Spare Devices : 0
       Checksum : 4a06c868 - correct
         Events : 0.777532

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     1      33       65        1      active sync   /dev/hdf1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      33       65        1      active sync   /dev/hdf1
   2     2      34        1        2      active sync   /dev/hdg1
   3     3       0        0        3      faulty removed
   4     4      22        1        4      active sync   /dev/hdc1

/dev/hdh1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Tue Feb 21 07:47:51 2006
          State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
  Spare Devices : 0
       Checksum : 49c0be2c - correct
         Events : 0.698097

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3      34       65        3      active sync   /dev/hdh1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      33       65        1      active sync   /dev/hdf1
   2     2      34        1        2      active sync   /dev/hdg1
   3     3      34       65        3      active sync   /dev/hdh1
   4     4      22        1        4      active sync   /dev/hdc1



/dev/hdh1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Tue Feb 21 07:47:51 2006
          State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
  Spare Devices : 0
       Checksum : 49c0be2c - correct
         Events : 0.698097

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3      34       65        3      active sync   /dev/hdh1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      33       65        1      active sync   /dev/hdf1
   2     2      34        1        2      active sync   /dev/hdg1
   3     3      34       65        3      active sync   /dev/hdh1
   4     4      22        1        4      active sync   /dev/hdc1


/dev/hdg1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 2d1d58c2:23357cca:12b8e65a:a80cdebe
  Creation Time : Tue Jul 26 17:20:10 2005
     Raid Level : raid5
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun Apr 16 09:10:28 2006
          State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
  Spare Devices : 0
       Checksum : 4a150771 - correct
         Events : 0.777535

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     2      34        1        2      active sync   /dev/hdg1

   0     0      33        1        0      active sync   /dev/hde1
   1     1       0        0        1      faulty removed
   2     2      34        1        2      active sync   /dev/hdg1
   3     3       0        0        3      faulty removed
   4     4      22        1        4      active sync   /dev/hdc1




-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux