On 8/22/23 16:41, Niklas Cassel wrote: > >On Mon, Aug 14, 2023 at 12:17:20PM +0800, Szuying Chen wrote: >> On 8/10/23 17:59, Damien Le Moal wrote: >> >On 8/10/23 18:31, Szuying Chen wrote: >> >>> On 8/10/23 14:12, Damien Le Moal wrote: >> >>> On 8/10/23 14:05, Szuying Chen wrote: >> >>> > When ISR handle interface fatal error with error recovery after clear PxIS >> >>> > and PxIE. Another FIS(SDB FIS with err) that set PxIS.IFS to 1 is recevied >> >>> > during error recovery, which causing the HBA not issue any new commands >> >>> > after cmd.ST set 1. >> >>> >> >>> This is not very clear. If there was a fatal error, the drive should be in >> >>> error state and no other SDB FIS can be received as the drive does absolutely >> >>> nothing while in error state (it only waits for a read log 10h command to be> >> >>> issued to get it out of error state). So if you are seeing 2 SDB FIS with >> >>> errors one after the other, you have a buggy device... >> >>> >> >>> However, I may be misunderstanding your issue here. Could you provide more >> >>> details and a dmesg output example of the issue ? >> >> >> >> According to AHCI 1.3.1 specification ch6.1.9, when an R_ERR is >> >> received on an H2D Data FIS in normal operation, the HBA sets >> >> PxIS.IFS to 1 (fatal error) and halts operation. Referring to SATA >> >> 3.0 specification we know the device will halt queued command >> >> processing and transmit SDB FIS to host with ERR bit in Status field set to one(set PxIS.TFES to 1). >> > >> >Sure, but that SBD FIS should be completely ignored by the adapter >> >since it stopped operation. If you see it, then it means that the >> >handling of the first error was incomplete. >> > >> The SDB FIS happens between pxIS.IE was cleared and PxCMD.ST not >> cleared to 0. Because the PxCMD.ST still setting, so the SDB FIS can't ignored. >> >> >> In our case, the ISR handles fatal errors(PxIS.IFS) and enters >> >> error recovery after cleaning up PxIS and PxIE. Then a SDB FIS is >> >> received with interrupt bit(PxIS.TFES) set to 1. According to AHCI >> >> 1.3.1 specification ch6.2.2, HBA can't issue(cmd.ST set to 1) any >> >> new commands under PxIS.TFES alive during error recovery. >> > >> >But how come you see a new command being issued ? This entire >> >situation should result in a port reset with the first error. I do not see how this is possible. >> >Are you saying that the port reset is not cleaning things up properly >> >? Could you share the dmesg output of this case ? >> > >> > >> dmesg: >> [ 654.244958] nouveau 0000:09:00.0: AMD-Vi: Event logged >> [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 654.460522] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 656.180375] ata7.00: >> exception Emask 0x10 SAct 0x7c0 SErr 0x200100 > >Hello Szuying Chen, > >From your logs we can see that SErr 0x200100: >This means that PxSERR.DIAG.C was set. > >This means that: >"Calculated Different CRC than Received: When this occurs, the HBA returns R_ERR and returns to idle." > >(Since PxIS.IFS is set, we know that it was a D2H Data FIS that had invalid CRC. (If it was a D2H non-Data FIS PxIS.INFS would have been set instead.)) > > >So your commit message is wrong. In your case it is the HBA that sends a R_ERR to the device, so you should look at "6.1.10 CRC error in received FIS" and not "6.1.9 Device responds to FIS with R_ERR". > >"When a CRC error occurs on a D2H Data FIS, the HBA sets PxIS.IFS to ‘1’. >After a CRC error on a D2H Data FIS, the device may send a D2H Register FIS to indicate the reason for the error. The host should allow posting of non-Data FISes before PxCMD.ST is cleared to ‘0’." > >Looking at the state machine "5.3.11.2 DR:Receive" you will jump to state ERR:Fatal. (Meaning that we need to clear PxCMD.ST in order to issue new >commands.) > > >The HBA should this trigger an error irq when noticing the bad CRC. > >It is possible that the HBA triggers another error IRQ when receiving a D2H with TFES bit set, if it is received before PxCMD.ST is cleared to 0, see NDR:Accept: >FIS Type is D2H Register and PxCMD.ST=0, jump to P:RegFISUpdate, which does not trigger any IRQ. >If PxCMD.ST is set, the state machine will instead jump to RegFIS:Entry. > > >So, in order to understand what is going on, can you please add unconditional prints in ahci_error_intr(), so we can actually see how many times this is called. (Right now it seems to only be called once). Please also print when PxCMD.ST is cleared to 0. >(So that we can verify that ahci_error_intr() is never called after that). Also please remove all prints from other drivers, e.g. nouveau. > > >Kind regards, >Niklas > > Hi Niklas, In our case, receiving D2H with TFES bit set(PxIS.TFES set to 1) doesn't triggers aother error IRQ before PxCMD.ST is cleared. Because IRQ was turn off(PxIE=0). After received D2H with TFES bit set, port is restart(clearing PxCMD.ST to '0' and then setting PxCMD.ST to '1').In the dmesg output, device is lose after the port restart 3 times. Durring this time, the TFES bit was not cleared. The Linux kernel dmesg output as shown below: [ 40.251795] ahci_error_intr start (irq_stat 0x08000000) **** [ 40.251799] ahci_error_intr: interface fatal error [ 40.251800] ahci_error_intr: freeze (set pxIE to 0) [ 40.251805] ahci_error_intr end (PORT_IRQ_MASK = 0x00000000 )**** [ 40.298113] ata8.00: exception Emask 0x10 SAct 0x70e000 SErr 0x280100 action 0x6 frozen [ 40.298117] ata8.00: irq_stat 0x08000000, interface fatal error [ 40.298119] ata8: SError: { UnrecovData 10B8B BadCRC } [ 40.298121] ata8.00: failed command: READ FPDMA QUEUED [ 40.298122] ata8.00: cmd 60/b8:68:c0:c3:03/08:00:00:00:00/40 tag 13 ncq dma 1142784 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298126] ata8.00: status: { DRDY } [ 40.298128] ata8.00: failed command: READ FPDMA QUEUED [ 40.298129] ata8.00: cmd 60/b8:70:78:cc:03/05:00:00:00:00/40 tag 14 ncq dma 749568 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298132] ata8.00: status: { DRDY } [ 40.298133] ata8.00: failed command: READ FPDMA QUEUED [ 40.298134] ata8.00: cmd 60/00:78:30:d2:03/0a:00:00:00:00/40 tag 15 ncq dma 1310720 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298137] ata8.00: status: { DRDY } [ 40.298138] ata8.00: failed command: READ FPDMA QUEUED [ 40.298139] ata8.00: cmd 60/00:a0:30:dc:03/0a:00:00:00:00/40 tag 20 ncq dma 1310720 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298142] ata8.00: status: { DRDY } [ 40.298143] ata8.00: failed command: READ FPDMA QUEUED [ 40.298144] ata8.00: cmd 60/40:a8:30:e6:03/08:00:00:00:00/40 tag 21 ncq dma 1081344 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298147] ata8.00: status: { DRDY } [ 40.298148] ata8.00: failed command: READ FPDMA QUEUED [ 40.298149] ata8.00: cmd 60/90:b0:70:ee:03/09:00:00:00:00/40 tag 22 ncq dma 1253376 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 40.298151] ata8.00: status: { DRDY } [ 40.298154] ata8: hard resetting link [ 40.298155] ata8: ahci_hardreset ***** [ 40.298160] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004d517) [ 40.298166] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 40.298167] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 40.608009] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 40.608031] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 40.608041] ata8: ahci_softreset ***** [ 40.608049] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 40.608059] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 40.608064] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 40.608069] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 40.608079] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 50.199215] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 50.199236] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 50.199242] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 50.199254] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 50.199264] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 50.199269] ata8: softreset failed (1st FIS failed) [ 50.199278] ata8: hard resetting link [ 50.199283] ata8: ahci_hardreset ***** [ 50.199289] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 50.199298] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 50.199302] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 50.514443] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 50.514464] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 50.514474] ata8: ahci_softreset ***** [ 50.514482] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 50.514492] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 50.514496] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 50.514502] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 50.514511] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 60.173000] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 60.173020] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 60.173026] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 60.173038] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 60.173048] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 60.173053] ata8: softreset failed (1st FIS failed) [ 60.173063] ata8: hard resetting link [ 60.173067] ata8: ahci_hardreset ***** [ 60.173073] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 60.173082] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 60.173086] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 60.488241] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 60.488262] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 60.488272] ata8: ahci_softreset ***** [ 60.488280] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 60.488289] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 60.488294] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 60.488300] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 60.488309] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 95.144383] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 95.144402] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 95.144409] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 95.144420] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 95.144430] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 95.144435] ata8: softreset failed (1st FIS failed) [ 95.144450] ata8: limiting SATA link speed to 3.0 Gbps [ 95.144454] ata8: hard resetting link [ 95.144458] ata8: ahci_hardreset ***** [ 95.144464] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 95.144473] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 95.144477] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 95.459765] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 95.459785] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 95.459795] ata8: ahci_softreset ***** [ 95.459803] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 95.459813] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 95.459817] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 95.459823] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 95.459832] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 100.142968] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) [ 100.142987] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 100.142993] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 100.143005] ahci 0000:6f:00.0: ahci_start_engine start (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) *** [ 100.143015] ahci 0000:6f:00.0: ahci_start_engine end (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c017) *** [ 100.143020] ata8: softreset failed (1st FIS failed) [ 100.143035] ata8: reset failed, giving up [ 100.143040] ata8.00: disable device [ 100.143073] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x00000000 PORT_CMD 0x00044016) [ 100.143079] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 100.143095] ata8: EH complete [ 100.143157] sd 7:0:0:0: [sdb] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143166] sd 7:0:0:0: [sdb] tag#13 CDB: Read(10) 28 00 00 03 c3 c0 00 08 b8 00 [ 100.143170] I/O error, dev sdb, sector 246720 op 0x0:(READ) flags 0x4000 phys_seg 168 prio class 2 [ 100.143196] sd 7:0:0:0: [sdb] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143201] sd 7:0:0:0: [sdb] tag#14 CDB: Read(10) 28 00 00 03 cc 78 00 05 b8 00 [ 100.143204] I/O error, dev sdb, sector 248952 op 0x0:(READ) flags 0x0 phys_seg 88 prio class 2 [ 100.143226] sd 7:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s [ 100.143238] sd 7:0:0:0: [sdb] tag#24 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00 [ 100.143279] sd 7:0:0:0: [sdb] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143287] sd 7:0:0:0: [sdb] tag#15 CDB: Read(10) 28 00 00 03 d2 30 00 0a 00 00 [ 100.143291] I/O error, dev sdb, sector 250416 op 0x0:(READ) flags 0x4000 phys_seg 112 prio class 2 [ 100.143313] sd 7:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143322] sd 7:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 03 dc 30 00 0a 00 00 [ 100.143327] I/O error, dev sdb, sector 252976 op 0x0:(READ) flags 0x4000 phys_seg 81 prio class 2 [ 100.143346] sd 7:0:0:0: [sdb] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143353] sd 7:0:0:0: [sdb] tag#17 CDB: Read(10) 28 00 00 03 e6 30 00 08 40 00 [ 100.143357] I/O error, dev sdb, sector 255536 op 0x0:(READ) flags 0x4000 phys_seg 65 prio class 2 [ 100.143499] sd 7:0:0:0: [sdb] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=60s [ 100.143508] sd 7:0:0:0: [sdb] tag#18 CDB: Read(10) 28 00 00 03 ee 70 00 09 90 00 [ 100.143513] I/O error, dev sdb, sector 257648 op 0x0:(READ) flags 0x0 phys_seg 70 prio class 2 The Linux(add clear pending interrupt) kernel dmesg output as shown below: [ 331.569303] ahci_error_intr start (irq_stat 0x08000000) **** [ 331.569320] ahci_error_intr: interface fatal error [ 331.569324] ahci_error_intr: freeze (set pxIE to 0) [ 331.569345] ahci_error_intr end (PORT_IRQ_MASK = 0x00000000 )**** [ 331.599010] ata8.00: exception Emask 0x10 SAct 0x18 SErr 0x280100 action 0x6 frozen [ 331.599026] ata8.00: irq_stat 0x08000000, interface fatal error [ 331.599032] ata8: SError: { UnrecovData 10B8B BadCRC } [ 331.599042] ata8.00: failed command: READ FPDMA QUEUED [ 331.599047] ata8.00: cmd 60/00:18:e0:3d:19/0a:00:00:00:00/40 tag 3 ncq dma 1310720 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 331.599064] ata8.00: status: { DRDY } [ 331.599070] ata8.00: failed command: READ FPDMA QUEUED [ 331.599074] ata8.00: cmd 60/90:20:e0:47:19/06:00:00:00:00/40 tag 4 ncq dma 860160 in res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ 331.599089] ata8.00: status: { DRDY } [ 331.599096] ata8: hard resetting link [ 331.599100] ata8: ahci_hardreset ***** [ 331.599107] ahci 0000:6f:00.0: ahci_stop_engine start *** ( PORT_IRQ_STAT 0x40000008 PORT_CMD 0x0004c317) [ 331.599118] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x40000008 PORT_CMD 0x00044016) [ 331.599123] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 331.914440] ahci 0000:6f:00.0: ahci_start_engine: check PORT_SCR_ERR 0x00000000, and to clear [ 331.914447] ahci 0000:6f:00.0: ahci_start_engine: check PORT_IRQ_STAT 0x40000008, and to clear [ 331.914456] ata8: ahci_softreset ***** [ 331.914466] ahci 0000:6f:00.0: ahci_stop_engine: setting HBA to idle (PORT_IRQ_STAT 0x00000000 PORT_CMD 0x00044016) [ 331.914470] ahci 0000:6f:00.0: ahci_stop_engine end *** [ 331.914473] ahci 0000:6f:00.0: ahci_start_engine: check PORT_SCR_ERR 0x00000000, and to clear [ 331.914477] ahci 0000:6f:00.0: ahci_start_engine: check PORT_IRQ_STAT 0x00000000, and to clear [ 332.075031] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 332.075718] ata8.00: configured for UDMA/133 [ 332.075739] ata8: EH complete Thanks. > >> action 0x6 frozen >> [ 656.180384] ata7.00: irq_stat 0x08000000, interface fatal error [ >> 656.180386] ata7: SError: { UnrecovData BadCRC } [ 656.180389] >> ata7.00: failed command: READ FPDMA QUEUED [ 656.180390] ata7.00: cmd >> 60/00:30:28:a9:9c/0a:00:01:00:00/40 tag 6 ncq dma 1310720 in >> res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 656.180395] ata7.00: status: { DRDY } [ 656.180397] ata7.00: failed >> command: READ FPDMA QUEUED [ 656.180398] ata7.00: cmd >> 60/00:38:28:b3:9c/0a:00:01:00:00/40 tag 7 ncq dma 1310720 in >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 656.180402] ata7.00: status: { DRDY } [ 656.180403] ata7.00: failed >> command: READ FPDMA QUEUED [ 656.180404] ata7.00: cmd >> 60/00:40:28:bd:9c/0a:00:01:00:00/40 tag 8 ncq dma 1310720 in >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 656.180408] ata7.00: status: { DRDY } [ 656.180410] ata7.00: failed >> command: READ FPDMA QUEUED [ 656.180411] ata7.00: cmd >> 60/00:48:28:c7:9c/0a:00:01:00:00/40 tag 9 ncq dma 1310720 in >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 656.180414] ata7.00: status: { DRDY } [ 656.180416] ata7.00: failed >> command: READ FPDMA QUEUED [ 656.180417] ata7.00: cmd >> 60/00:50:28:d1:9c/0a:00:01:00:00/40 tag 10 ncq dma 1310720 in >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 656.180421] ata7.00: status: { DRDY } [ 656.180423] ata7: hard >> resetting link [ 659.210328] amd_iommu_report_page_fault: 48 >> callbacks suppressed [ 659.210333] nouveau 0000:09:00.0: AMD-Vi: >> Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.210345] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.210352] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.811069] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.811080] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.811086] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.827669] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.827676] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.827680] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 659.878319] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.426805] >> amd_iommu_report_page_fault: 248 callbacks suppressed [ 664.426809] >> nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.426819] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.426825] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.676294] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.676305] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.676313] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.693501] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.693513] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.693521] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 664.711480] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 666.180398] ata7: softreset >> failed (1st FIS failed) [ 666.180405] ata7: hard resetting link [ >> 670.054987] amd_iommu_report_page_fault: 99 callbacks suppressed [ >> 670.054992] nouveau 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 670.055003] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] [ 670.055010] nouveau >> 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x0019 address=0x0 flags=0x0000] >> >> >> dmesg (add clear pending Interrupt): >> [ 120.236847] ahci_error_intr: interface fatal error [ 120.236854] >> ahci_error_intr: freeze (set PxIE to 0) [ 120.265268] ata7: limiting >> SATA link speed to 3.0 Gbps [ 120.265275] ata7.00: exception Emask >> 0x10 SAct 0x3f0 SErr 0x400000 action 0x6 frozen [ 120.265279] >> ata7.00: irq_stat 0x08000000, interface fatal error [ 120.265281] >> ata7: SError: { Handshk } [ 120.265285] ata7.00: failed command: >> WRITE FPDMA QUEUED [ 120.265287] ata7.00: cmd >> 61/00:20:a8:b9:50/0a:00:00:00:00/40 tag 4 ncq dma 1310720 ou >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265295] ata7.00: status: { DRDY } [ 120.265297] ata7.00: failed >> command: WRITE FPDMA QUEUED [ 120.265299] ata7.00: cmd >> 61/20:28:a8:c3:50/03:00:00:00:00/40 tag 5 ncq dma 409600 out >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265305] ata7.00: status: { DRDY } [ 120.265307] ata7.00: failed >> command: WRITE FPDMA QUEUED [ 120.265309] ata7.00: cmd >> 61/00:30:c8:c6:50/0a:00:00:00:00/40 tag 6 ncq dma 1310720 ou >> res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265314] ata7.00: status: { DRDY } [ 120.265316] ata7.00: failed >> command: WRITE FPDMA QUEUED [ 120.265318] ata7.00: cmd >> 61/00:38:c8:d0:50/0a:00:00:00:00/40 tag 7 ncq dma 1310720 ou >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265324] ata7.00: status: { DRDY } [ 120.265325] ata7.00: failed >> command: WRITE FPDMA QUEUED [ 120.265327] ata7.00: cmd >> 61/00:40:c8:da:50/0a:00:00:00:00/40 tag 8 ncq dma 1310720 ou >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265333] ata7.00: status: { DRDY } [ 120.265334] ata7.00: failed >> command: WRITE FPDMA QUEUED [ 120.265336] ata7.00: cmd >> 61/00:48:c8:e4:50/0a:00:00:00:00/40 tag 9 ncq dma 1310720 ou >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [ >> 120.265342] ata7.00: status: { DRDY } [ 120.265346] ata7: hard >> resetting link [ 120.265348] ata7: ahci_hardreset ***** [ >> 120.265352] ahci 0000:0a:00.0: ahci_stop_engine: PORT_IRQ_STAT >> 0x40000008 PORT_CMD 0x0004c617 [ 120.265356] ahci 0000:0a:00.0: >> ahci_stop_engine: setting HBA to idle (setting PxCMD.ST to 0) [ >> 120.937522] ahci 0000:0a:00.0: ahci_start_engine: check PORT_SCR_ERR >> 0x00000000, and to clear [ 120.937534] ahci 0000:0a:00.0: >> ahci_start_engine: check PORT_IRQ_STAT 0x40000008, and to clear [ >> 120.937544] ata7: ahci_softreset ***** [ 120.937550] ahci >> 0000:0a:00.0: ahci_stop_engine: setting HBA to idle (setting PxCMD.ST >> to 0) [ 120.937555] ahci 0000:0a:00.0: ahci_start_engine: check >> PORT_SCR_ERR 0x00000000, and to clear [ 120.937559] ahci >> 0000:0a:00.0: ahci_start_engine: check PORT_IRQ_STAT 0x00000000, and >> to clear [ 121.097231] ata7: SATA link up 3.0 Gbps (SStatus 123 >> SControl 320) [ 121.113493] ata7.00: configured for UDMA/133 [ >> 121.113570] ata7: EH complete >> >> I have printk PxIS & PxCMD values to check pending interrupt. >> It SBD FIS with ERR to setting PxIS.TFES and PxIS.SDBS bit. >> >> >> >> >>> > >> >>> > Signed-off-by: Szuying Chen <Chloe_Chen@xxxxxxxxxxxxxx> >> >>> > --- >> >>> > drivers/ata/libahci.c | 12 ++++++++++++ >> >>> > 1 file changed, 12 insertions(+) >> >>> > >> >>> > diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c >> >>> > index 06aec35f88f2..0ae51fd95d46 100644 >> >>> > --- a/drivers/ata/libahci.c >> >>> > +++ b/drivers/ata/libahci.c >> >>> > @@ -679,9 +679,21 @@ static int ahci_scr_write(struct ata_link *link, unsigned int sc_reg, u32 val) >> >>> > >> >>> > void ahci_start_engine(struct ata_port *ap) >> >>> > { >> >>> > + struct ahci_host_priv *hpriv = ap->host->private_data; >> >>> > void __iomem *port_mmio = ahci_port_base(ap); >> >>> > u32 tmp; >> >>> > >> >>> > + /* clear SError */ >> >>> > + tmp = readl(port_mmio + PORT_SCR_ERR); >> >>> > + writel(tmp, port_mmio + PORT_SCR_ERR); >> >>> > + >> >>> > + /* clear port IRQ */ >> >>> > + tmp = readl(port_mmio + PORT_IRQ_STAT); >> >>> > + if (tmp) >> >>> > + writel(tmp, port_mmio + PORT_IRQ_STAT); >> >>> > + >> >>> > + writel(1 << ap->port_no, hpriv->mmio + PORT_IRQ_STAT); >> >>> > + >> >>> > /* start DMA */ >> >>> > tmp = readl(port_mmio + PORT_CMD); >> >>> > tmp |= PORT_CMD_START; >> >>> > -- >> >>> > 2.39.2 >> >>> > >> >>> >> >>> -- >> >>> Damien Le Moal >> >>> Western Digital Research >> >>> >> >> Thanks. >> >> >> > >> >-- >> >Damien Le Moal >> >Western Digital Research ><p></p> >