On Fri, May 5, 2023 at 7:19 PM Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> wrote: > > For a long time I noticed that at boot time I often see disk errors, but later on all seems well. > Below is an example of relevant log messages after a boot. > > Initially things are normal for all (7) disks in the array, then there is a burst of messages for sdb, including two resets. > I marked the sdb messages. It is as if this one disk takes longer to come up. > > I see this on three disks but not on the other four (all are the same model, Seagate ST12000NM0007 [Yes, I know]). > > I wonder if this situation can be related to the controller (LSISAS2008) or maybe the cabling. > Four cables attach to a socket (there are two on this controller) and only three of the disks on one bundle show the problem > and not the fourth, and none of the three on the second bundle have issues. > > Then again it may indicate a disk issue, and an RMA is due? I regularly run an "Extended offline" test and it is always successful. > Or maybe some timeout is too short (can I set it?). > > Following such an incident I see smartctl reporting an increase in Command_Timeout and UDMA_CRC_Error_Count. > > TIA > Eyal > > ================ log start ============== > 2023-05-05T17:15:44+1000 kernel: Linux version 6.2.14-100.fc36.x86_64 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Mon May 1 00:54:35 UTC 2023 > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32705204 kB) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: MSI-X vectors supported: 1 > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: 0 1 1 > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: High IOPs queues : disabled > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: iomem(0x00000000514c0000), mapped(0x00000000d8efeca3), size(16384) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: ioport(0x0000000000004000), size(256) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: request pool(0x000000003049b737) - dma(0x111800000): depth(3492), frame_size(128), pool_size(436 kB) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sense pool(0x000000008e6843eb) - dma(0x111f00000): depth(3367), element_size(96), pool_size (315 kB) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: reply pool(0x00000000acd81aaa) - dma(0x111f80000): depth(3556), frame_size(128), pool_size(444 kB) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: config page(0x00000000c56162d9) - dma(0x111eb5000): size(512) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Allocated physical memory: size(7579 kB) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Scatter Gather Elements per IO(128) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Protocol=(Initiator,Target > 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sending port enable !! > 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: hba_port entry: 00000000e9b01ff1, port: 255 is added to hba_port list > 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0013ca580), phys(8) > 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0x9) sas_address(0x4433221100000000) port_type(0x1) > 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0xa) sas_address(0x4433221101000000) port_type(0x1) > 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xb) sas_address(0x4433221102000000) port_type(0x1) > 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xc) sas_address(0x4433221103000000) port_type(0x1) > 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xd) sas_address(0x4433221105000000) port_type(0x1) > 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xe) sas_address(0x4433221106000000) port_type(0x1) > 2023-05-05T17:15:49+1000 kernel: mpt2sas_cm0: handle(0xf) sas_address(0x4433221107000000) port_type(0x1) > 2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: port enable: SUCCESS > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0 <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Attached scsi generic sg3 type 0 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Attached scsi generic sg4 type 0 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Attached scsi generic sg5 type 0 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 4096-byte physical blocks <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Attached scsi generic sg6 type 0 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Attached scsi generic sg7 type 0 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: Power-on or device reset occurred > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 4096-byte physical blocks > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write Protect is off > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Mode Sense: 7f 00 10 08 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA > 2023-05-05T17:15:53+1000 kernel: sdd: sdd1 > 2023-05-05T17:15:53+1000 kernel: sdh: sdh1 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: sdg: sdg1 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: sdc: sdc1 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: sdf: sdf1 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: sde: sde1 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Attached SCSI disk > 2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) <<<<< start > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred <<<<< > 2023-05-05T17:15:53+1000 kernel: sdb: sdb1 <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Attached SCSI disk <<<<< > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512) > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00 > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Sense Key : Aborted Command [current] > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Add. Sense: Information unit iuCRC error detected > 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00 > 2023-05-05T17:15:53+1000 kernel: I/O error, dev sdb, sector 23437770624 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512) > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00 > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Sense Key : Aborted Command [current] > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Add. Sense: Information unit iuCRC error detected > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00 > 2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437770352 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 > 2023-05-05T17:15:54+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 CDB: Read(16) 88 00 00 00 00 05 74 ff f3 f0 00 00 00 08 00 00 > 2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437767664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 > 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred <<<<< end > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdh1 operational as raid disk 6 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdf1 operational as raid disk 4 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdb1 operational as raid disk 0 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdd1 operational as raid disk 2 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdc1 operational as raid disk 1 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdg1 operational as raid disk 5 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sde1 operational as raid disk 3 > 2023-05-05T17:16:01+1000 kernel: md/raid:md127: raid level 6 active with 7 out of 7 devices, algorithm 2 > 2023-05-05T17:16:01+1000 kernel: md127: detected capacity change from 0 to 117187522560 > 2023-05-05T17:16:03+1000 kernel: EXT4-fs (md127): mounted filesystem 378e74a6-e379-4bd5-ade5-f3cd85952099 with ordered data mode. Quota mode: none. > CRC are usually cable/connection issues. I used to get similar and I took apart and cleaned/vacuumed all of the connectors and most of the CRC issues went away. I had errors on both on-motherboard and SAS2008 ports. On other SATA controllers each reset for a CRC issue seems to reduce the speed by half and that "fixes" the errors. Vacuum all of the dust and/or wipe down with alcohol and/or blow out with air the various connectors. The mb/sas2008, all cable ends and disks/hot swap enclosures if you have them. _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue