https://bugzilla.kernel.org/show_bug.cgi?id=219467 Bug ID: 219467 Summary: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5 Product: SCSI Drivers Version: 2.5 Hardware: All OS: Linux Status: NEW Severity: normal Priority: P3 Component: AACRAID Assignee: scsi_drivers-aacraid@xxxxxxxxxxxxxxxxxxxx Reporter: kernel-bugzilla@xxxxxxxxxxxxx Regression: No On October 31st I upgraded a system from Fedora 40 to Fedora 41. This upgraded the kernel from 6.10.6-200.fc40.x86_64 to 6.11.5-300.fc41.x86_64. One of the system's primary uses is as a NAS using an Adaptec 71605 and zfs-2.2.6. The system does zfs scrubs on the two zfs filesystems on Mondays, like Oct 28th and Nov 4th. On Oct 28th it was still on the 6.10.6 kernel, and today it was on the 6.11.5 kernel. The errors repeated until I woke up, and found the scrubs had stopped from zfs errors caused by the controller errors. After a bit I rebooted the system, and then had to stop the scrubs again. They had automatically restarted. I then installed 6.10.14-200.fc40.x86_64, and restarted the scrubs. The scrub processes started at nearly 4am. You can see from the timing of the logs below that the errors didn't start for over two hours into the scrub. The house thermostat is set to 73F/76F, and the outside temperature at 6am was 45F. So the room shouldn't have been unusually hot. I saw zfs read and write errors on all the drives on the 71605. I restarted the scrubs after downgrading to 6.10.14. It has been about three hours since then. Which means it has lasted longer than 6.11.5 so far. I will update with a new comment when it either throws an error or completes. I built the system in May of 2021, and it hasn't given many any issues like this before. It started with a 5.11.12-300.fc34 kernel. I did look for a newer version of the disk controller's bios, but found it is already the latest, 32118. System hardware: AMD Ryzen 9 5950X, processor Kingston 128gb(4x32gb) DDR4 ECC, memory ASUS Pro WS X570-ACE, motherboard Adaptec 71605, disk controller 6 WD 18tb SATA, drives(one on the 71605, rest on other controllers) 9 WD 8tb SATA, drives(all on the 71605) BIOS/Firmware versions: BIOS : 7.5-0 (32118) Firmware : 7.5-0 (32118) A older, but very similar bug: https://bugzilla.kernel.org/show_bug.cgi?id=217599 Timing of scrubs and errors: Nov 04 03:46:01 storage zed[2545101]: eid=11 class=scrub_start pool='data18' Nov 04 03:46:11 storage zed[2545231]: eid=13 class=scrub_start pool='data8' Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request. Errors: Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (2,1,12,0): Nov 04 06:09:08 storage kernel: aacraid: Host bus reset request. SCSI hang ? Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: midlevel-0 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: lowlevel-0 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error handler-8 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: firmware-0 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Controller reset type is 3 Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: IOP reset failed Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed Nov 04 06:11:19 storage kernel: aacraid: Host bus reset request. SCSI hang ? Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: midlevel-0 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: lowlevel-0 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error Issuing IOP resethandler-0 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: firmware-124 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Controller reset type is 3 Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset Nov 04 06:11:19 storage kernel: rfkill wmi_bmof snd_timer drm_ttm_helper pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco nvme_auth wmi ip6_tables ip_tables fuse Nov 04 06:11:19 storage kernel: src_sync_cmd+0x108/0x2e0 [aacraid] Nov 04 06:11:19 storage kernel: aac_src_restart_adapter.part.0+0x112/0x2b6 [aacraid] Nov 04 06:11:19 storage kernel: aac_reset_adapter+0xeb/0x650 [aacraid] Nov 04 06:11:19 storage kernel: aac_eh_host_reset+0x62/0xe0 [aacraid] Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: IOP reset failed Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed Nov 04 06:12:34 storage kernel: mxm_wmi nvme sha512_ssse3 aacraid Nov 04 06:13:04 storage kernel: aacraid: Host bus reset request. SCSI hang ? Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3 Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: midlevel-0 Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: lowlevel-0 Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error handler-0 Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: firmware-1 Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0 Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Controller reset type is 3 Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset Nov 04 06:13:05 storage kernel: rfkill wmi_bmof snd_timer drm_ttm_helper pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco nvme_auth wmi ip6_tables ip_tables fuse Nov 04 06:13:05 storage kernel: src_sync_cmd+0x108/0x2e0 [aacraid] Nov 04 06:13:05 storage kernel: aac_src_restart_adapter.part.0+0x112/0x2b6 [aacraid] Nov 04 06:13:05 storage kernel: aac_reset_adapter+0xeb/0x650 [aacraid] Nov 04 06:13:05 storage kernel: aac_eh_host_reset+0x62/0xe0 [aacraid] Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: IOP reset failed Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.