[Bug 219467] New: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=219467

            Bug ID: 219467
           Summary: Adaptec 71605 hangs with aacraid: Host adapter abort
                    request after update to linux 6.11.5
           Product: SCSI Drivers
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: AACRAID
          Assignee: scsi_drivers-aacraid@xxxxxxxxxxxxxxxxxxxx
          Reporter: kernel-bugzilla@xxxxxxxxxxxxx
        Regression: No

On October 31st I upgraded a system from Fedora 40 to Fedora 41. This upgraded
the kernel from 6.10.6-200.fc40.x86_64 to 6.11.5-300.fc41.x86_64. One of the
system's primary uses is as a NAS using an Adaptec 71605 and zfs-2.2.6. The
system does zfs scrubs on the two zfs filesystems on Mondays, like Oct 28th and
Nov 4th. On Oct 28th it was still on the 6.10.6 kernel, and today it was on the
6.11.5 kernel.

The errors repeated until I woke up, and found the scrubs had stopped from zfs
errors caused by the controller errors. After a bit I rebooted the system, and
then had to stop the scrubs again. They had automatically restarted. I then
installed 6.10.14-200.fc40.x86_64, and restarted the scrubs.

The scrub processes started at nearly 4am. You can see from the timing of the
logs below that the errors didn't start for over two hours into the scrub. The
house thermostat is set to 73F/76F, and the outside temperature at 6am was 45F.
So the room shouldn't have been unusually hot.

I saw zfs read and write errors on all the drives on the 71605.

I restarted the scrubs after downgrading to 6.10.14. It has been about three
hours since then. Which means it has lasted longer than 6.11.5 so far. I will
update with a new comment when it either throws an error or completes.

I built the system in May of 2021, and it hasn't given many any issues like
this before. It started with a 5.11.12-300.fc34 kernel.

I did look for a newer version of the disk controller's bios, but found it is
already the latest, 32118.

System hardware:
AMD Ryzen 9 5950X, processor
Kingston 128gb(4x32gb) DDR4 ECC, memory
ASUS Pro WS X570-ACE, motherboard
Adaptec 71605, disk controller
6 WD 18tb SATA, drives(one on the 71605, rest on other controllers)
9 WD 8tb SATA, drives(all on the 71605)

BIOS/Firmware versions:
BIOS                                       : 7.5-0 (32118)
Firmware                                   : 7.5-0 (32118)

A older, but very similar bug:
https://bugzilla.kernel.org/show_bug.cgi?id=217599

Timing of scrubs and errors:
Nov 04 03:46:01 storage zed[2545101]: eid=11 class=scrub_start pool='data18'
Nov 04 03:46:11 storage zed[2545231]: eid=13 class=scrub_start pool='data8'
Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request.

Errors:
Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
                                aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
handler-8
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed
Nov 04 06:11:19 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
Issuing IOP resethandler-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-124
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:11:19 storage kernel:  rfkill wmi_bmof snd_timer drm_ttm_helper
pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss
nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi
nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco
nvme_auth wmi ip6_tables ip_tables fuse
Nov 04 06:11:19 storage kernel:  src_sync_cmd+0x108/0x2e0 [aacraid]
Nov 04 06:11:19 storage kernel:  aac_src_restart_adapter.part.0+0x112/0x2b6
[aacraid]
Nov 04 06:11:19 storage kernel:  aac_reset_adapter+0xeb/0x650 [aacraid]
Nov 04 06:11:19 storage kernel:  aac_eh_host_reset+0x62/0xe0 [aacraid]
Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed
Nov 04 06:12:34 storage kernel:  mxm_wmi nvme sha512_ssse3 aacraid
Nov 04 06:13:04 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
handler-0
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-1
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:13:05 storage kernel:  rfkill wmi_bmof snd_timer drm_ttm_helper
pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss
nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi
nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco
nvme_auth wmi ip6_tables ip_tables fuse
Nov 04 06:13:05 storage kernel:  src_sync_cmd+0x108/0x2e0 [aacraid]
Nov 04 06:13:05 storage kernel:  aac_src_restart_adapter.part.0+0x112/0x2b6
[aacraid]
Nov 04 06:13:05 storage kernel:  aac_reset_adapter+0xeb/0x650 [aacraid]
Nov 04 06:13:05 storage kernel:  aac_eh_host_reset+0x62/0xe0 [aacraid]
Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux