[Bug 217599] Adaptec 71605z hangs with aacraid: Host adapter abort request after update to linux 6.4.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=217599

Joop Boonen (joop.boonen@xxxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joop.boonen@xxxxxxxxxx

--- Comment #26 from Joop Boonen (joop.boonen@xxxxxxxxxx) ---
We have noticed on our Server using an Adaptec ASR8805 RAID controller running
Debian 12 i.e. Bookworm Kernel 6.1.55
That we get 100% wait states that causes the system to hang.
top - 12:57:32 up 7 min,  2 users,  load average: 5.02, 1.71, 0.65
Tasks: 451 total,   2 running, 449 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 81.8 id, 18.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,100.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257590.5 total, 242751.4 free,  10355.7 used,   6092.0 buff/cache    
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 247234.8 avail Mem

When it's running with a < 6.1.53 Kernel we never see 100% wait states,
certainly not staining for a long time.

We also saw repeatedly:
[ 1376.837737] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.841731] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.842412] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843004] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843587] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844169] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844747] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845322] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845906] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.846484] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847055] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847628] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849336] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849995] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.850560] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.789765] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.889767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.890899] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.892002] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.893103] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.897790] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.898918] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.900009] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.901094] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.902199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.903287] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.904384] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.905472] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.906585] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.907678] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.945954] aacraid: Host bus reset request. SCSI hang ?
[ 1378.946602] aacraid 0000:af:00.0: outstanding cmd: midlevel-0
[ 1378.946607] aacraid 0000:af:00.0: outstanding cmd: lowlevel-0
[ 1378.946610] aacraid 0000:af:00.0: outstanding cmd: error handler-0
[ 1378.946613] aacraid 0000:af:00.0: outstanding cmd: firmware-32
[ 1378.946616] aacraid 0000:af:00.0: outstanding cmd: kernel-0
[ 1378.961850] aacraid 0000:af:00.0: Controller reset type is 3
[ 1378.962435] aacraid 0000:af:00.0: Issuing IOP reset
[ 1412.498211] aacraid 0000:af:00.0: IOP reset succeeded
[ 1412.523256] aacraid: Comm Interface type2 enabled
[ 1424.734176] aacraid 0000:af:00.0: Scheduling bus rescan
[ 1434.755589] aacraid 0000:af:00.0: DDR cache data recovered successfully

On another server that has an Adaptec ASR8405 raid controller running exactly
the same Distribution and kernel we don't see this issue at all.

The only major difference is that the system that has the problem has two
sockets i.e. CPUs.
This one also has SSD drives, but I don't think this could be an issue?

We have found out that this issue exists since Kernel 6.1.53. 
We found that Kernel 6.1.53 incorporated this patch: 
scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity

https://www.spinics.net/lists/stable-commits/msg313381.html

I think that this ticket is related to this issue.
https://bugzilla.kernel.org/show_bug.cgi?id=217599

and this email/link
https://lore.kernel.org/regressions/4a639fff-445e-455b-9a31-57368d6b7021@xxxxxxxxxxxxx/

We have tested Kernel 6.1.55 like the one in Debian Bookworm with the
above-mentioned patch reverted. It worked flawlessly.

Might it be related to multiple CPU sockets i.e. CPUs. As we don't have an
issue on a single Socket system.

Both systems have an Intel Xeon CPU(s).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux