HDD errors during boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For a long time I noticed that at boot time I often see disk errors, but later on all seems well.
Below is an example of relevant log messages after a boot.

Initially things are normal for all (7) disks in the array, then there is a burst of messages for sdb, including two resets.
I marked the sdb messages. It is as if this one disk takes longer to come up.

I see this on three disks but not on the other four (all are the same model, Seagate ST12000NM0007 [Yes, I know]).

I wonder if this situation can be related to the controller (LSISAS2008) or maybe the cabling.
Four cables attach to a socket (there are two on this controller) and only three of the disks on one bundle show the problem
and not the fourth, and none of the three on the second bundle have issues.

Then again it may indicate a disk issue, and an RMA is due? I regularly run an "Extended offline" test and it is always successful.
Or maybe some timeout is too short (can I set it?).

Following such an incident I see smartctl reporting an increase in Command_Timeout and UDMA_CRC_Error_Count.

TIA
	Eyal

================ log start ==============
2023-05-05T17:15:44+1000 kernel: Linux version 6.2.14-100.fc36.x86_64 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Mon May  1 00:54:35 UTC 2023
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32705204 kB)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: MSI-X vectors supported: 1
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0:  0 1 1
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: High IOPs queues : disabled
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: iomem(0x00000000514c0000), mapped(0x00000000d8efeca3), size(16384)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: ioport(0x0000000000004000), size(256)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: request pool(0x000000003049b737) - dma(0x111800000): depth(3492), frame_size(128), pool_size(436 kB)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sense pool(0x000000008e6843eb) - dma(0x111f00000): depth(3367), element_size(96), pool_size (315 kB)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: reply pool(0x00000000acd81aaa) - dma(0x111f80000): depth(3556), frame_size(128), pool_size(444 kB)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: config page(0x00000000c56162d9) - dma(0x111eb5000): size(512)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Allocated physical memory: size(7579 kB)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Scatter Gather Elements per IO(128)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Protocol=(Initiator,Target
2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sending port enable !!
2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: hba_port entry: 00000000e9b01ff1, port: 255 is added to hba_port list
2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0013ca580), phys(8)
2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0x9) sas_address(0x4433221100000000) port_type(0x1)
2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0xa) sas_address(0x4433221101000000) port_type(0x1)
2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xb) sas_address(0x4433221102000000) port_type(0x1)
2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xc) sas_address(0x4433221103000000) port_type(0x1)
2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xd) sas_address(0x4433221105000000) port_type(0x1)
2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xe) sas_address(0x4433221106000000) port_type(0x1)
2023-05-05T17:15:49+1000 kernel: mpt2sas_cm0: handle(0xf) sas_address(0x4433221107000000) port_type(0x1)
2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: port enable: SUCCESS
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0					<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred					<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Attached scsi generic sg3 type 0
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Attached scsi generic sg4 type 0
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Attached scsi generic sg5 type 0
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)	<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 4096-byte physical blocks					<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Attached scsi generic sg6 type 0
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Attached scsi generic sg7 type 0
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: Power-on or device reset occurred
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 4096-byte physical blocks
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write Protect is off
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Mode Sense: 7f 00 10 08
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
2023-05-05T17:15:53+1000 kernel:  sdd: sdd1
2023-05-05T17:15:53+1000 kernel:  sdh: sdh1
2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel:  sdg: sdg1
2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel:  sdc: sdc1
2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel:  sdf: sdf1
2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel:  sde: sde1
2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Attached SCSI disk
2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)	<<<<< start
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred						<<<<<
2023-05-05T17:15:53+1000 kernel:  sdb: sdb1										<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Attached SCSI disk							<<<<<
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512)
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Sense Key : Aborted Command [current]
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Add. Sense: Information unit iuCRC error detected
2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00
2023-05-05T17:15:53+1000 kernel: I/O error, dev sdb, sector 23437770624 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512)
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Sense Key : Aborted Command [current]
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Add. Sense: Information unit iuCRC error detected
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00
2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437770352 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
2023-05-05T17:15:54+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 CDB: Read(16) 88 00 00 00 00 05 74 ff f3 f0 00 00 00 08 00 00
2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437767664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred						<<<<< end
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdh1 operational as raid disk 6
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdf1 operational as raid disk 4
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdb1 operational as raid disk 0
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdd1 operational as raid disk 2
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdc1 operational as raid disk 1
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdg1 operational as raid disk 5
2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sde1 operational as raid disk 3
2023-05-05T17:16:01+1000 kernel: md/raid:md127: raid level 6 active with 7 out of 7 devices, algorithm 2
2023-05-05T17:16:01+1000 kernel: md127: detected capacity change from 0 to 117187522560
2023-05-05T17:16:03+1000 kernel: EXT4-fs (md127): mounted filesystem 378e74a6-e379-4bd5-ade5-f3cd85952099 with ordered data mode. Quota mode: none.

--
Eyal Lebedinsky (fedora@xxxxxxxxxxxxxx)
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux