raid6 glorious failure - help (or at least a shoulder to cry on) needed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I'm in a bit of trouble with a raid6 array and any feedback would be
appreciated...

Of course no one will be held responsible for what will happen to the array
as doing anything with the array will be ultimately only my decision.

So, long story short:

- raid 6, - 5 x 3 TB disks, Seagate Barracuda (ST3000DM001-1ER1 x 2,
ST3000DM008-2DM1 x 2, ST3000DM001-1CH1)
- I know (now, after reading raid.wiki.kernel.org) the HDD choice was not
very good and I have to deal with it now (I suspect that "scterc" missing
feature of the disks or the hardware failure of the LSI controller caused
all this mess)
- disks connections: mixed via motherboard SATA ports and LSI HBA
controller, maybe not the best idea to mix but this setup worked fine for
2-3 years now...
- CentOS release 6.9 (Final), ASRock H77 Pro4-M, 8GB RAM, LSI controller
(SAS 9217-8i Host Bus Adapter)
- another raid1 (Samsung SSD 840) with the OS still running with no
glitches, both disks connected via motherboard SATA


(Please note that the /dev/sdX letters below may change as I have
added/removed other disks to clone the raid6 disks or changed their SATA
ports)

############################################################################
###################
1. suddenly array went offline (I have quite a few logs but I copied just
what I thought it would be helpful,
please let me know if the full log (~600k) would be better). It looks that I
may have a bit of filesystem errors too,
but hey don't discourage me - one problem at a time!

Mar  9 19:35:29 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:30 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:31 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:32 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:33 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:34 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is
non-operational !!!!
Mar  9 19:35:34 space kernel: mpt2sas0: _base_fault_reset_work: Running
mpt2sas_dead_ioc thread success !!!!
Mar  9 19:35:34 space kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Mar  9 19:35:34 space kernel: sd 0:0:0:0: [sda]
Mar  9 19:35:34 space kernel: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
Mar  9 19:35:34 space kernel: mpt2sas0: removing handle(0x000c),
sas_addr(0x4433221107000000)
Mar  9 19:35:34 space kernel: sd 0:0:1:0: [sdb] Synchronizing SCSI cache
Mar  9 19:35:34 space kernel: sd 0:0:1:0: [sdb]
Mar  9 19:35:34 space kernel: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
Mar  9 19:35:34 space kernel: mpt2sas0: removing handle(0x0009),
sas_addr(0x4433221104000000)
Mar  9 19:35:34 space kernel: sd 0:0:2:0: [sdc] Synchronizing SCSI cache
Mar  9 19:35:34 space kernel: sd 0:0:2:0: [sdc]
Mar  9 19:36:31 space kernel: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
Mar  9 19:36:31 space kernel: mpt2sas0: removing handle(0x000a),
sas_addr(0x4433221105000000)
Mar  9 19:36:31 space kernel: sd 0:0:3:0: [sdd] Synchronizing SCSI cache
Mar  9 19:36:31 space kernel: sd 0:0:3:0: [sdd]
Mar  9 19:36:31 space kernel: Result: hostbyte=DID_NO_CONNECT
driverbyte=DRIVER_OK
Mar  9 19:36:31 space kernel: mpt2sas0: removing handle(0x000b),
sas_addr(0x4433221106000000)
Mar  9 19:36:31 space kernel: mpt2sas0: sending diag reset !!
Mar  9 19:36:31 space kernel: mpt2sas0: diag reset: FAILED
Mar  9 19:36:31 space kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr
0x90800 action 0xe frozen
Mar  9 19:36:31 space kernel: ata3.00: SError: { HostInt PHYRdyChg 10B8B }
Mar  9 19:36:31 space kernel: ata3.00: failed command: FLUSH CACHE
Mar  9 19:36:31 space kernel: ata3.00: cmd
e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar  9 19:36:31 space kernel:         res
40/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x54 (ATA bus error)
Mar  9 19:36:31 space kernel: ata3.00: status: { DRDY }
Mar  9 19:36:31 space kernel: ata3.00: hard resetting link
Mar  9 19:36:31 space kernel: ata4.00: exception Emask 0x50 SAct 0x0 SErr
0x90800 action 0xe frozen
Mar  9 19:36:31 space kernel: ata4.00: SError: { HostInt PHYRdyChg 10B8B }
Mar  9 19:36:31 space kernel: ata4.00: failed command: FLUSH CACHE
Mar  9 19:36:31 space kernel: ata4.00: cmd
e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar  9 19:36:31 space kernel:         res
40/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x54 (ATA bus error)
Mar  9 19:36:31 space kernel: ata4.00: status: { DRDY }
Mar  9 19:36:31 space kernel: ata4.00: hard resetting link
Mar  9 19:36:31 space kernel: ata3.01: hard resetting link
Mar  9 19:36:31 space kernel: ata4.01: hard resetting link
Mar  9 19:36:31 space kernel: ata3.00: SATA link up 6.0 Gbps (SStatus 133
SControl 330)
Mar  9 19:36:31 space kernel: ata3.01: SATA link down (SStatus 0 SControl
330)
Mar  9 19:36:31 space kernel: ata4.00: SATA link up 6.0 Gbps (SStatus 133
SControl 330)
Mar  9 19:36:31 space kernel: ata4.01: SATA link down (SStatus 0 SControl
330)
Mar  9 19:36:31 space kernel: ata3.00: configured for UDMA/133
Mar  9 19:36:31 space kernel: ata3.00: retrying FLUSH 0xe7 Emask 0x54
Mar  9 19:36:31 space kernel: ata3: EH complete
Mar  9 19:36:31 space kernel: ata4.00: configured for UDMA/133
Mar  9 19:36:31 space kernel: ata4.00: retrying FLUSH 0xe7 Emask 0x54
Mar  9 19:36:31 space kernel: ata4: EH complete
Mar  9 19:36:36 space kernel: md/raid:md127: Disk failure on sdc, disabling
device.
Mar  9 19:36:36 space kernel: md/raid:md127: Operation continuing on 4
devices.
Mar  9 19:36:36 space kernel: md/raid:md127: Disk failure on sda, disabling
device.
Mar  9 19:36:36 space kernel: md/raid:md127: Operation continuing on 3
devices.
Mar  9 19:36:36 space kernel: md/raid:md127: Disk failure on sdb, disabling
device.
Mar  9 19:36:36 space kernel: md/raid:md127: Operation continuing on 2
devices.
Mar  9 19:36:36 space kernel: md: super_written gets error=-19, uptodate=0
Mar  9 19:36:36 space kernel: md/raid:md127: Disk failure on sdd, disabling
device.
Mar  9 19:36:36 space kernel: md/raid:md127: Operation continuing on 1
devices.
Mar  9 19:36:37 space kernel: md: unbind<sdd>
Mar  9 19:36:37 space kernel: md: export_rdev(sdd)

Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory
lblock 0
Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory
lblock 0
Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory
lblock 0
Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory
lblock 0
Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory
lblock 0
Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:08 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #52142087: comm updatedb: reading directory
lblock 0
Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:00:08 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #52142087: comm updatedb: reading directory
lblock 0
Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
[ many many many lines like the section above ]

Mar 10 03:22:56 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:22:56 space kernel: quiet_error: 2696 callbacks suppressed
Mar 10 03:22:56 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:22:56 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:22:56 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #51585030: comm smbd: reading directory lblock 0
Mar 10 03:22:56 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:00 space kernel: Aborting journal on device dm-3-8.
Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block
731938816
Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:00 space kernel: JBD2: Error -5 detected when updating journal
superblock for dm-3-8.
Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:00 space kernel: EXT4-fs error (device dm-3):
__ext4_journal_start_sb:62: Detected aborted journal
Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): Remounting filesystem
read-only
Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:02 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:02 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:02 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:02 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #51585029: comm smbd: reading directory lblock 0
Mar 10 03:23:02 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:23:03 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:04 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:04 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:04 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:04 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #51658758: comm smbd: reading directory lblock 0
Mar 10 03:23:04 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:23:05 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:05 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:05 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:05 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #51560456: comm smbd: reading directory lblock 0
Mar 10 03:23:05 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:1372: error reading directory block (ino 51512001,
block 2)
Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:09 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:09 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:23:11 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:11 space kernel: quiet_error: 2 callbacks suppressed
Mar 10 03:23:11 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:11 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:11 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #51650567: comm smbd: reading directory lblock 0
Mar 10 03:23:11 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51650567, block
0)
Mar 10 03:23:11 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51650567, block
0)
Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512473: block 206045289: comm smbd:
unable to read itable block
Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512654: block 206045300: comm smbd:
unable to read itable block
Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #162269586: block 649068729: comm smbd:
unable to read itable block
Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512433: block 206045287: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51650567, block
0)
Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51650567, block
0)
Mar 10 03:23:31 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:31 space kernel: quiet_error: 7 callbacks suppressed
Mar 10 03:23:31 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:31 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:31 space kernel: EXT4-fs error (device dm-3):
ext4_find_entry:1309: inode #355731472: comm smbd: reading directory lblock
0
Mar 10 03:23:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 355731472,
block 0)
Mar 10 03:23:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 355731472,
block 0)
Mar 10 03:23:32 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 355731472,
block 0)
Mar 10 03:23:32 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:32 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:23:32 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:23:32 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512473: block 206045289: comm smbd:
unable to read itable block
Mar 10 03:23:33 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:23:33 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512654: block 206045300: comm ls: unable
to read itable block
Mar 10 03:24:26 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:24:26 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #162269586: block 649068729: comm ls:
unable to read itable block
Mar 10 03:24:26 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:24:26 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51512433: block 206045287: comm ls: unable
to read itable block
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585030, block
0)
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51585029, block
0)
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51560456, block
0)
Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51658758, block
0)
Mar 10 03:24:31 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:24:31 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:31 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:31 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm du: unable
to read itable block
Mar 10 03:24:31 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:24:31 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:35 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:35 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:24:35 space kernel: EXT4-fs (dm-3): previous I/O error to
superblock detected
Mar 10 03:24:35 space kernel: Buffer I/O error on device dm-3, logical block
0
Mar 10 03:24:35 space kernel: lost page write due to I/O error on dm-3
Mar 10 03:24:35 space kernel: EXT4-fs error (device dm-3):
__ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd:
unable to read itable block
Mar 10 03:24:35 space kernel: EXT4-fs warning (device dm-3):
__ext4_read_dirblock:908: error reading directory block (ino 51512001, block
2)


############################################################################
###################
2. since the failure I have done no writes on those disks




############################################################################
###################
3. smartctl long and short tests show the disks are ok; I can provide the
output should you think it is useful.





############################################################################
###################
4. the "mdadm --examine" output (I've put in some "<<<<" signs to timestamps
and event numbers):

/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
           Name : storage00server:100  (local to host storage00server)
  Creation Time : Thu May  9 21:09:42 2013
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=944 sectors
          State : clean
    Device UUID : 74882e49:8294ae56:1c6eafbe:2c9eb6ec

    Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : e0b8ef21 - correct
         Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)




/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)



/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
           Name : storage00server:100  (local to host storage00server)
  Creation Time : Thu May  9 21:09:42 2013
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=944 sectors
          State : clean
    Device UUID : 325fcaac:8195916b:8cb2871b:3f54f1c4

    Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 8e4ac163 - correct
         Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
           Name : storage00server:100  (local to host storage00server)
  Creation Time : Thu May  9 21:09:42 2013
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=944 sectors
          State : clean
    Device UUID : fd3ccca5:2f0ec0af:1e1f64f8:be53ce86

    Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 6a3483eb - correct
         Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdg:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
           Name : storage00server:100  (local to host storage00server)
  Creation Time : Thu May  9 21:09:42 2013
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=944 sectors
          State : clean
    Device UUID : 3fe05e31:aea12f6f:30219c17:c858e069

    Update Time : Sat Mar 10 03:28:16 2018
<<<<<<<<<<<<<<<<<<<<<<<
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b776a20c - correct
         Events : 2444333
<<<<<<<<<<<<<<<<<<<<<<<

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : ...A. ('A' == active, '.' == missing, 'R' == replacing)



So one disk out of five is completely gone (sdb, from the raid6 array's
point of view?).
Then three of them sda, sde and sdf have the same number of events (2 444
205) and the same timestamp (Fri Mar  9 11:33:32 2018).
The last one, sdg, has a later timestamp (Sat Mar 10 03:28:16 2018) and a
higher number of events (2 444 333).


############################################################################
###################
5. the /dev/md127 is automatically recognized by the system at boot (output)
and brought to the state below.
It seems that it is trying to automatically assemble the /dev/md127 using
the disk with the latest timestamp.


# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md0 : active raid1 sdf1[3] sdg1[2]
      716736 blocks super 1.0 [2/2] [UU]

md4 : active raid0 sdi[1] sdh[0]
      3906766848 blocks super 1.2 512k chunks

md127 : active raid6 sda[5](F) sdb[7](F) sde[8] sdc[9](F)
      8790405120 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/1]
[___U_]

md1 : active raid1 sdg2[2] sdf2[1]
      116436864 blocks super 1.1 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>



# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu May  9 21:09:42 2013
     Raid Level : raid6
  Used Dev Size : -1
   Raid Devices : 5
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Sat Mar 10 03:28:16 2018
          State : active, FAILED, Not Started
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : storage00server:100  (local to host storage00server)
           UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
         Events : 2444333

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       2       0        0        2      removed
       4       0        0        4      removed
       8       8       80        3      active sync   /dev/sdf <<<<<<<<<<
(previously detected as sdg and having the greates number of events -
2444333)
       8       0        0        8      removed



############################################################################
###################
6. So having 5 devices in the raid6 I had the fancy idea of assembling the
array using only the three drives that have the same number of events and
timestamp but I've got this output:

# mdadm --verbose --assemble --readonly /dev/md13 /dev/sda /dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md13
mdadm: Found some drive for an array that is already active:
/dev/md/storage00server:100
mdadm: giving up.

!!! ok, it looks like not a good idea, let's "mdadm --stop /dev/md127" and
then use its old name of md127:

# mdadm --verbose --assemble --readonly /dev/md127 /dev/sda /dev/sdf
/dev/sdg
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
mdadm: added /dev/sdg to /dev/md127 as 1
mdadm: added /dev/sdf to /dev/md127 as 2
mdadm: no uptodate device for slot 3 of /dev/md127
mdadm: no uptodate device for slot 4 of /dev/md127
mdadm: added /dev/sda to /dev/md127 as 0
mdadm: /dev/md127 assembled from 3 drives - need all 5 to start it (use
--run to insist).

Should I insist?

############################################################################
############################


I am now in the process of dd+bzip the physical disks before trying anything
potentially dangerous, this is quite time-consuming.
So before I do anything with the array I have to also figure out how to get
some disk space for these copies.

The approach here is that I would really need my data back, I have a partial
backup from 1-2 months ago but I have added new files that are quite
important.

Anyway to circle back to the beginning of the email - any ideas would be
appreciated, feel free to ask for more info if needed.

Kind Regards,
JL


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux