Hi All, I'm in a bit of trouble with a raid6 array and any feedback would be appreciated... Of course no one will be held responsible for what will happen to the array as doing anything with the array will be ultimately only my decision. So, long story short: - raid 6, - 5 x 3 TB disks, Seagate Barracuda (ST3000DM001-1ER1 x 2, ST3000DM008-2DM1 x 2, ST3000DM001-1CH1) - I know (now, after reading raid.wiki.kernel.org) the HDD choice was not very good and I have to deal with it now (I suspect that "scterc" missing feature of the disks or the hardware failure of the LSI controller caused all this mess) - disks connections: mixed via motherboard SATA ports and LSI HBA controller, maybe not the best idea to mix but this setup worked fine for 2-3 years now... - CentOS release 6.9 (Final), ASRock H77 Pro4-M, 8GB RAM, LSI controller (SAS 9217-8i Host Bus Adapter) - another raid1 (Samsung SSD 840) with the OS still running with no glitches, both disks connected via motherboard SATA (Please note that the /dev/sdX letters below may change as I have added/removed other disks to clone the raid6 disks or changed their SATA ports) ############################################################################ ################### 1. suddenly array went offline (I have quite a few logs but I copied just what I thought it would be helpful, please let me know if the full log (~600k) would be better). It looks that I may have a bit of filesystem errors too, but hey don't discourage me - one problem at a time! Mar 9 19:35:29 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:30 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:31 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:32 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:33 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:34 space kernel: mpt2sas0: _base_fault_reset_work : SAS host is non-operational !!!! Mar 9 19:35:34 space kernel: mpt2sas0: _base_fault_reset_work: Running mpt2sas_dead_ioc thread success !!!! Mar 9 19:35:34 space kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Mar 9 19:35:34 space kernel: sd 0:0:0:0: [sda] Mar 9 19:35:34 space kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 9 19:35:34 space kernel: mpt2sas0: removing handle(0x000c), sas_addr(0x4433221107000000) Mar 9 19:35:34 space kernel: sd 0:0:1:0: [sdb] Synchronizing SCSI cache Mar 9 19:35:34 space kernel: sd 0:0:1:0: [sdb] Mar 9 19:35:34 space kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 9 19:35:34 space kernel: mpt2sas0: removing handle(0x0009), sas_addr(0x4433221104000000) Mar 9 19:35:34 space kernel: sd 0:0:2:0: [sdc] Synchronizing SCSI cache Mar 9 19:35:34 space kernel: sd 0:0:2:0: [sdc] Mar 9 19:36:31 space kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 9 19:36:31 space kernel: mpt2sas0: removing handle(0x000a), sas_addr(0x4433221105000000) Mar 9 19:36:31 space kernel: sd 0:0:3:0: [sdd] Synchronizing SCSI cache Mar 9 19:36:31 space kernel: sd 0:0:3:0: [sdd] Mar 9 19:36:31 space kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 9 19:36:31 space kernel: mpt2sas0: removing handle(0x000b), sas_addr(0x4433221106000000) Mar 9 19:36:31 space kernel: mpt2sas0: sending diag reset !! Mar 9 19:36:31 space kernel: mpt2sas0: diag reset: FAILED Mar 9 19:36:31 space kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x90800 action 0xe frozen Mar 9 19:36:31 space kernel: ata3.00: SError: { HostInt PHYRdyChg 10B8B } Mar 9 19:36:31 space kernel: ata3.00: failed command: FLUSH CACHE Mar 9 19:36:31 space kernel: ata3.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 9 19:36:31 space kernel: res 40/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x54 (ATA bus error) Mar 9 19:36:31 space kernel: ata3.00: status: { DRDY } Mar 9 19:36:31 space kernel: ata3.00: hard resetting link Mar 9 19:36:31 space kernel: ata4.00: exception Emask 0x50 SAct 0x0 SErr 0x90800 action 0xe frozen Mar 9 19:36:31 space kernel: ata4.00: SError: { HostInt PHYRdyChg 10B8B } Mar 9 19:36:31 space kernel: ata4.00: failed command: FLUSH CACHE Mar 9 19:36:31 space kernel: ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 9 19:36:31 space kernel: res 40/00:01:e0:4f:c2/00:00:00:00:00/00 Emask 0x54 (ATA bus error) Mar 9 19:36:31 space kernel: ata4.00: status: { DRDY } Mar 9 19:36:31 space kernel: ata4.00: hard resetting link Mar 9 19:36:31 space kernel: ata3.01: hard resetting link Mar 9 19:36:31 space kernel: ata4.01: hard resetting link Mar 9 19:36:31 space kernel: ata3.00: SATA link up 6.0 Gbps (SStatus 133 SControl 330) Mar 9 19:36:31 space kernel: ata3.01: SATA link down (SStatus 0 SControl 330) Mar 9 19:36:31 space kernel: ata4.00: SATA link up 6.0 Gbps (SStatus 133 SControl 330) Mar 9 19:36:31 space kernel: ata4.01: SATA link down (SStatus 0 SControl 330) Mar 9 19:36:31 space kernel: ata3.00: configured for UDMA/133 Mar 9 19:36:31 space kernel: ata3.00: retrying FLUSH 0xe7 Emask 0x54 Mar 9 19:36:31 space kernel: ata3: EH complete Mar 9 19:36:31 space kernel: ata4.00: configured for UDMA/133 Mar 9 19:36:31 space kernel: ata4.00: retrying FLUSH 0xe7 Emask 0x54 Mar 9 19:36:31 space kernel: ata4: EH complete Mar 9 19:36:36 space kernel: md/raid:md127: Disk failure on sdc, disabling device. Mar 9 19:36:36 space kernel: md/raid:md127: Operation continuing on 4 devices. Mar 9 19:36:36 space kernel: md/raid:md127: Disk failure on sda, disabling device. Mar 9 19:36:36 space kernel: md/raid:md127: Operation continuing on 3 devices. Mar 9 19:36:36 space kernel: md/raid:md127: Disk failure on sdb, disabling device. Mar 9 19:36:36 space kernel: md/raid:md127: Operation continuing on 2 devices. Mar 9 19:36:36 space kernel: md: super_written gets error=-19, uptodate=0 Mar 9 19:36:36 space kernel: md/raid:md127: Disk failure on sdd, disabling device. Mar 9 19:36:36 space kernel: md/raid:md127: Operation continuing on 1 devices. Mar 9 19:36:37 space kernel: md: unbind<sdd> Mar 9 19:36:37 space kernel: md: export_rdev(sdd) Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory lblock 0 Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory lblock 0 Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory lblock 0 Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory lblock 0 Mar 10 03:00:02 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:00:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:00:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #335282868: comm updatedb: reading directory lblock 0 Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:08 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #52142087: comm updatedb: reading directory lblock 0 Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:00:08 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #52142087: comm updatedb: reading directory lblock 0 Mar 10 03:00:08 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected [ many many many lines like the section above ] Mar 10 03:22:56 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:22:56 space kernel: quiet_error: 2696 callbacks suppressed Mar 10 03:22:56 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:22:56 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:22:56 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #51585030: comm smbd: reading directory lblock 0 Mar 10 03:22:56 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:00 space kernel: Aborting journal on device dm-3-8. Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block 731938816 Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:00 space kernel: JBD2: Error -5 detected when updating journal superblock for dm-3-8. Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:00 space kernel: EXT4-fs error (device dm-3): __ext4_journal_start_sb:62: Detected aborted journal Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): Remounting filesystem read-only Mar 10 03:23:00 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:00 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:00 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:02 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:02 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:02 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:02 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #51585029: comm smbd: reading directory lblock 0 Mar 10 03:23:02 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:23:03 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:04 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:04 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:04 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:04 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #51658758: comm smbd: reading directory lblock 0 Mar 10 03:23:04 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:23:05 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:05 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:05 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:05 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #51560456: comm smbd: reading directory lblock 0 Mar 10 03:23:05 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:1372: error reading directory block (ino 51512001, block 2) Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:06 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:06 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:06 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:06 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:09 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:09 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:10 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:10 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:23:10 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:23:11 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:11 space kernel: quiet_error: 2 callbacks suppressed Mar 10 03:23:11 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:11 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:11 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #51650567: comm smbd: reading directory lblock 0 Mar 10 03:23:11 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51650567, block 0) Mar 10 03:23:11 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51650567, block 0) Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512473: block 206045289: comm smbd: unable to read itable block Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512654: block 206045300: comm smbd: unable to read itable block Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #162269586: block 649068729: comm smbd: unable to read itable block Mar 10 03:23:12 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:12 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:12 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:12 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512433: block 206045287: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:13 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511559: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511556: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511913: block 206045254: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:13 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd: unable to read itable block Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51650567, block 0) Mar 10 03:23:13 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51650567, block 0) Mar 10 03:23:31 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:31 space kernel: quiet_error: 7 callbacks suppressed Mar 10 03:23:31 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:31 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:31 space kernel: EXT4-fs error (device dm-3): ext4_find_entry:1309: inode #355731472: comm smbd: reading directory lblock 0 Mar 10 03:23:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 355731472, block 0) Mar 10 03:23:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 355731472, block 0) Mar 10 03:23:32 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 355731472, block 0) Mar 10 03:23:32 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:32 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:23:32 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:23:32 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512473: block 206045289: comm smbd: unable to read itable block Mar 10 03:23:33 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:23:33 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512654: block 206045300: comm ls: unable to read itable block Mar 10 03:24:26 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:24:26 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #162269586: block 649068729: comm ls: unable to read itable block Mar 10 03:24:26 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:24:26 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:26 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:26 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51512433: block 206045287: comm ls: unable to read itable block Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585030, block 0) Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51585029, block 0) Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51560456, block 0) Mar 10 03:24:31 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51658758, block 0) Mar 10 03:24:31 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:24:31 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:31 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:31 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511825: block 206045249: comm du: unable to read itable block Mar 10 03:24:31 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:24:31 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:35 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:35 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511553: block 206045232: comm smbd: unable to read itable block Mar 10 03:24:35 space kernel: EXT4-fs (dm-3): previous I/O error to superblock detected Mar 10 03:24:35 space kernel: Buffer I/O error on device dm-3, logical block 0 Mar 10 03:24:35 space kernel: lost page write due to I/O error on dm-3 Mar 10 03:24:35 space kernel: EXT4-fs error (device dm-3): __ext4_get_inode_loc:4027: inode #51511557: block 206045232: comm smbd: unable to read itable block Mar 10 03:24:35 space kernel: EXT4-fs warning (device dm-3): __ext4_read_dirblock:908: error reading directory block (ino 51512001, block 2) ############################################################################ ################### 2. since the failure I have done no writes on those disks ############################################################################ ################### 3. smartctl long and short tests show the disks are ok; I can provide the output should you think it is useful. ############################################################################ ################### 4. the "mdadm --examine" output (I've put in some "<<<<" signs to timestamps and event numbers): /dev/sda: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd Name : storage00server:100 (local to host storage00server) Creation Time : Thu May 9 21:09:42 2013 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 8790405120 (8383.18 GiB 9001.37 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=944 sectors State : clean Device UUID : 74882e49:8294ae56:1c6eafbe:2c9eb6ec Update Time : Fri Mar 9 11:33:32 2018 <<<<<<<<<<<<<<<<<<<<<<< Bad Block Log : 512 entries available at offset 72 sectors Checksum : e0b8ef21 - correct Events : 2444205 <<<<<<<<<<<<<<<<<<<<<<< Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdb: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sde: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd Name : storage00server:100 (local to host storage00server) Creation Time : Thu May 9 21:09:42 2013 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 8790405120 (8383.18 GiB 9001.37 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=944 sectors State : clean Device UUID : 325fcaac:8195916b:8cb2871b:3f54f1c4 Update Time : Fri Mar 9 11:33:32 2018 <<<<<<<<<<<<<<<<<<<<<<< Bad Block Log : 512 entries available at offset 72 sectors Checksum : 8e4ac163 - correct Events : 2444205 <<<<<<<<<<<<<<<<<<<<<<< Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd Name : storage00server:100 (local to host storage00server) Creation Time : Thu May 9 21:09:42 2013 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 8790405120 (8383.18 GiB 9001.37 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=944 sectors State : clean Device UUID : fd3ccca5:2f0ec0af:1e1f64f8:be53ce86 Update Time : Fri Mar 9 11:33:32 2018 <<<<<<<<<<<<<<<<<<<<<<< Bad Block Log : 512 entries available at offset 72 sectors Checksum : 6a3483eb - correct Events : 2444205 <<<<<<<<<<<<<<<<<<<<<<< Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdg: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd Name : storage00server:100 (local to host storage00server) Creation Time : Thu May 9 21:09:42 2013 Raid Level : raid6 Raid Devices : 5 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 8790405120 (8383.18 GiB 9001.37 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=944 sectors State : clean Device UUID : 3fe05e31:aea12f6f:30219c17:c858e069 Update Time : Sat Mar 10 03:28:16 2018 <<<<<<<<<<<<<<<<<<<<<<< Bad Block Log : 512 entries available at offset 72 sectors Checksum : b776a20c - correct Events : 2444333 <<<<<<<<<<<<<<<<<<<<<<< Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : ...A. ('A' == active, '.' == missing, 'R' == replacing) So one disk out of five is completely gone (sdb, from the raid6 array's point of view?). Then three of them sda, sde and sdf have the same number of events (2 444 205) and the same timestamp (Fri Mar 9 11:33:32 2018). The last one, sdg, has a later timestamp (Sat Mar 10 03:28:16 2018) and a higher number of events (2 444 333). ############################################################################ ################### 5. the /dev/md127 is automatically recognized by the system at boot (output) and brought to the state below. It seems that it is trying to automatically assemble the /dev/md127 using the disk with the latest timestamp. # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] md0 : active raid1 sdf1[3] sdg1[2] 716736 blocks super 1.0 [2/2] [UU] md4 : active raid0 sdi[1] sdh[0] 3906766848 blocks super 1.2 512k chunks md127 : active raid6 sda[5](F) sdb[7](F) sde[8] sdc[9](F) 8790405120 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/1] [___U_] md1 : active raid1 sdg2[2] sdf2[1] 116436864 blocks super 1.1 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Thu May 9 21:09:42 2013 Raid Level : raid6 Used Dev Size : -1 Raid Devices : 5 Total Devices : 1 Persistence : Superblock is persistent Update Time : Sat Mar 10 03:28:16 2018 State : active, FAILED, Not Started Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : storage00server:100 (local to host storage00server) UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd Events : 2444333 Number Major Minor RaidDevice State 0 0 0 0 removed 2 0 0 2 removed 4 0 0 4 removed 8 8 80 3 active sync /dev/sdf <<<<<<<<<< (previously detected as sdg and having the greates number of events - 2444333) 8 0 0 8 removed ############################################################################ ################### 6. So having 5 devices in the raid6 I had the fancy idea of assembling the array using only the three drives that have the same number of events and timestamp but I've got this output: # mdadm --verbose --assemble --readonly /dev/md13 /dev/sda /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md13 mdadm: Found some drive for an array that is already active: /dev/md/storage00server:100 mdadm: giving up. !!! ok, it looks like not a good idea, let's "mdadm --stop /dev/md127" and then use its old name of md127: # mdadm --verbose --assemble --readonly /dev/md127 /dev/sda /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md127 mdadm: /dev/sda is identified as a member of /dev/md127, slot 0. mdadm: /dev/sdf is identified as a member of /dev/md127, slot 2. mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1. mdadm: added /dev/sdg to /dev/md127 as 1 mdadm: added /dev/sdf to /dev/md127 as 2 mdadm: no uptodate device for slot 3 of /dev/md127 mdadm: no uptodate device for slot 4 of /dev/md127 mdadm: added /dev/sda to /dev/md127 as 0 mdadm: /dev/md127 assembled from 3 drives - need all 5 to start it (use --run to insist). Should I insist? ############################################################################ ############################ I am now in the process of dd+bzip the physical disks before trying anything potentially dangerous, this is quite time-consuming. So before I do anything with the array I have to also figure out how to get some disk space for these copies. The approach here is that I would really need my data back, I have a partial backup from 1-2 months ago but I have added new files that are quite important. Anyway to circle back to the beginning of the email - any ideas would be appreciated, feel free to ask for more info if needed. Kind Regards, JL -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html