Re: HDD errors during boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 5, 2023 at 7:19 PM Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> wrote:
>
> For a long time I noticed that at boot time I often see disk errors, but later on all seems well.
> Below is an example of relevant log messages after a boot.
>
> Initially things are normal for all (7) disks in the array, then there is a burst of messages for sdb, including two resets.
> I marked the sdb messages. It is as if this one disk takes longer to come up.
>
> I see this on three disks but not on the other four (all are the same model, Seagate ST12000NM0007 [Yes, I know]).
>
> I wonder if this situation can be related to the controller (LSISAS2008) or maybe the cabling.
> Four cables attach to a socket (there are two on this controller) and only three of the disks on one bundle show the problem
> and not the fourth, and none of the three on the second bundle have issues.
>
> Then again it may indicate a disk issue, and an RMA is due? I regularly run an "Extended offline" test and it is always successful.
> Or maybe some timeout is too short (can I set it?).
>
> Following such an incident I see smartctl reporting an increase in Command_Timeout and UDMA_CRC_Error_Count.
>
> TIA
>         Eyal
>
> ================ log start ==============
> 2023-05-05T17:15:44+1000 kernel: Linux version 6.2.14-100.fc36.x86_64 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Mon May  1 00:54:35 UTC 2023
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32705204 kB)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: MSI-X vectors supported: 1
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0:  0 1 1
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: High IOPs queues : disabled
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: iomem(0x00000000514c0000), mapped(0x00000000d8efeca3), size(16384)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: ioport(0x0000000000004000), size(256)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: request pool(0x000000003049b737) - dma(0x111800000): depth(3492), frame_size(128), pool_size(436 kB)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sense pool(0x000000008e6843eb) - dma(0x111f00000): depth(3367), element_size(96), pool_size (315 kB)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: reply pool(0x00000000acd81aaa) - dma(0x111f80000): depth(3556), frame_size(128), pool_size(444 kB)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: config page(0x00000000c56162d9) - dma(0x111eb5000): size(512)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Allocated physical memory: size(7579 kB)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Scatter Gather Elements per IO(128)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: Protocol=(Initiator,Target
> 2023-05-05T17:15:45+1000 kernel: mpt2sas_cm0: sending port enable !!
> 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: hba_port entry: 00000000e9b01ff1, port: 255 is added to hba_port list
> 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0013ca580), phys(8)
> 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0x9) sas_address(0x4433221100000000) port_type(0x1)
> 2023-05-05T17:15:47+1000 kernel: mpt2sas_cm0: handle(0xa) sas_address(0x4433221101000000) port_type(0x1)
> 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xb) sas_address(0x4433221102000000) port_type(0x1)
> 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xc) sas_address(0x4433221103000000) port_type(0x1)
> 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xd) sas_address(0x4433221105000000) port_type(0x1)
> 2023-05-05T17:15:48+1000 kernel: mpt2sas_cm0: handle(0xe) sas_address(0x4433221106000000) port_type(0x1)
> 2023-05-05T17:15:49+1000 kernel: mpt2sas_cm0: handle(0xf) sas_address(0x4433221107000000) port_type(0x1)
> 2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: port enable: SUCCESS
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0                                   <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred                                  <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Attached scsi generic sg3 type 0
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Attached scsi generic sg4 type 0
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Attached scsi generic sg5 type 0
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)      <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] 4096-byte physical blocks                                    <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Attached scsi generic sg6 type 0
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Attached scsi generic sg7 type 0
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: Power-on or device reset occurred
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 23437770752 512-byte logical blocks: (12.0 TB/10.9 TiB)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] 4096-byte physical blocks
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write Protect is off
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Mode Sense: 7f 00 10 08
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
> 2023-05-05T17:15:53+1000 kernel:  sdd: sdd1
> 2023-05-05T17:15:53+1000 kernel:  sdh: sdh1
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:2:0: [sdd] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel:  sdg: sdg1
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:5:0: [sdg] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:6:0: [sdh] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel:  sdc: sdc1
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:1:0: [sdc] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel:  sdf: sdf1
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:4:0: [sdf] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel:  sde: sde1
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:3:0: [sde] Attached SCSI disk
> 2023-05-05T17:15:53+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)        <<<<< start
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred                                          <<<<<
> 2023-05-05T17:15:53+1000 kernel:  sdb: sdb1                                                                             <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Attached SCSI disk                                                   <<<<<
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512)
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Sense Key : Aborted Command [current]
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 Add. Sense: Information unit iuCRC error detected
> 2023-05-05T17:15:53+1000 kernel: sd 6:0:0:0: [sdb] tag#33 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 80 00 00 00 08 00 00
> 2023-05-05T17:15:53+1000 kernel: I/O error, dev sdb, sector 23437770624 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] Unaligned partial completion (resid=1020, sector_sz=512)
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Sense Key : Aborted Command [current]
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 Add. Sense: Information unit iuCRC error detected
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#42 CDB: Read(16) 88 00 00 00 00 05 74 ff fe 70 00 00 00 08 00 00
> 2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437770352 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 2023-05-05T17:15:54+1000 kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: [sdb] tag#51 CDB: Read(16) 88 00 00 00 00 05 74 ff f3 f0 00 00 00 08 00 00
> 2023-05-05T17:15:54+1000 kernel: I/O error, dev sdb, sector 23437767664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 2023-05-05T17:15:54+1000 kernel: sd 6:0:0:0: Power-on or device reset occurred                                          <<<<< end
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdh1 operational as raid disk 6
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdf1 operational as raid disk 4
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdb1 operational as raid disk 0
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdd1 operational as raid disk 2
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdc1 operational as raid disk 1
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sdg1 operational as raid disk 5
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: device sde1 operational as raid disk 3
> 2023-05-05T17:16:01+1000 kernel: md/raid:md127: raid level 6 active with 7 out of 7 devices, algorithm 2
> 2023-05-05T17:16:01+1000 kernel: md127: detected capacity change from 0 to 117187522560
> 2023-05-05T17:16:03+1000 kernel: EXT4-fs (md127): mounted filesystem 378e74a6-e379-4bd5-ade5-f3cd85952099 with ordered data mode. Quota mode: none.
>

CRC are usually cable/connection issues.

I used to get similar and I took apart and cleaned/vacuumed all of the
connectors and most of the CRC issues went away.   I had errors on
both on-motherboard and SAS2008 ports.

On other SATA controllers each reset for a CRC issue seems to reduce
the speed by half and that "fixes" the errors.

Vacuum all of the dust and/or wipe down with alcohol and/or blow out
with air the various connectors.   The mb/sas2008, all cable ends and
disks/hot swap enclosures if you have them.
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux