Re: Bug: Completion-Wait loop timed out with vfio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/2023 1:03 PM, Tasos Sahanidis wrote:
> Thank you very much for your quick response, Abhishek.
> 
>> 1. Set disable_idle_d3 module parameter set and check if this issue happens.
> The issue does not happen with disable_idle_d3, which means I can at
> least now use newer kernels. All the following commands were ran
> *without* disable_idle_d3, so that the issue would occur.
> 
>> 2. Without starting the VM, check the status of following sysfs entries.
> I assume by /sys/bus/pci/devices/<B:D:F>/power/power_state you meant
> /sys/bus/pci/devices/<B:D:F>/power_state, as the former doesn't exist.
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> suspended
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D3hot
> 

 So D3cold is not supported on this system.
 Most of the desktop systems doesn’t support D3cold. 
 In that case, as Alex mentioned that after that patch the root port can also
 go into D3hot state.
 
 Another difference is that earlier we were changing the device power state by
 directly writing into PCI PM_CTRL registers. Now, we are using kernel generic
 runtime PM function to perform the same.

 We need to print the root port runtime status and power_state as Alex mentioned.

 Apart from that, can we try following things to get more information,

 Before binding the Device to vfio-pci driver, disable the runtime power
 management of the root port
 
 # echo on > /sys/bus/pci/devices/<root_port B:D:F>/power/control

 After this, bind the device to vfio-pci driver and check the runtime status and power_state
 for both device and root port. The root port runtime_status should be active and power_state
 should be D0.

 With the runtime PM disabled for the root port, check if this issue happens.
 It will give clue if the root port going into D3hot status is causing the issue or
 the use of runtime PM to put device into D3hot is causing this. 

>> 3. After issue happens, run the above command again.
> This is with the VM running and the errors in dmesg:
> 
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power/runtime_status
> active
> # cat /sys/bus/pci/devices/0000\:06\:00.0/power_state
> D0
> 
>> 4. Do lspci -s <B:D:F> -vvv without starting the VM and see if it is printing the correct
>>    results and there is no new prints in the dmesg.
> This is from before the VM was started:
> 
> # lspci -s 0000:06:00.0 -vvv
> 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
> 	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Interrupt: pin A routed to IRQ 255
> 	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
> 	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
> 	Region 4: I/O ports at d000 [disabled] [size=256]
> 	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
> 	Expansion ROM at fca40000 [disabled] [size=128K]
> 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> 		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s (ok), Width x4 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
> 			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS-
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [270 v1] Secondary PCI Express
> 		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> 		LaneErrStat: 0
> 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> 		ATSCap:	Invalidate Queue Depth: 00
> 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> 	Capabilities: [2c0 v1] Page Request Interface (PRI)
> 		PRICtl: Enable- Reset-
> 		PRISta: RF- UPRGI- Stopped+
> 		Page Request Capacity: 00000020, Page Request Allocation: 00000000
> 	Capabilities: [2d0 v1] Process Address Space ID (PASID)
> 		PASIDCap: Exec+ Priv+, Max PASID Width: 10
> 		PASIDCtl: Enable- Exec- Priv-
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: amdgpu
> 
> 
> This is the diff from while the VM was running:
> 
> @@ -1,25 +1,25 @@
>  root@tasos-Standard-PC-Q35-ICH9-2009:~# lspci -s 0000:06:00.0 -vvv
>  06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM] (prog-if 00 [VGA controller])
>  	Subsystem: ASUSTeK Computer Inc. Radeon HD 7790 DirectCU II OC
> -	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> -	Interrupt: pin A routed to IRQ 255
> -	Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
> -	Region 2: Memory at <ignored> (64-bit, prefetchable) [disabled]
> -	Region 4: I/O ports at d000 [disabled] [size=256]
> -	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=256K]
> +	Interrupt: pin A routed to IRQ 42
> +	Region 0: Memory at <ignored> (64-bit, prefetchable)
> +	Region 2: Memory at <ignored> (64-bit, prefetchable)
> +	Region 4: I/O ports at d000 [size=256]
> +	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [size=256K]
>  	Expansion ROM at fca40000 [disabled] [size=128K]
>  	Capabilities: [48] Vendor Specific Information: Len=08 <?>
>  	Capabilities: [50] Power Management version 3
>  		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> -		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
> +		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>  	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>  			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> +		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>  		LnkCap:	Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>  		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 
> 
> If I am reading this correctly, the card claims to not support D3cold.
> 
> Nothing extra was printed in dmesg while running the lspci commands.
> 
> Regarding the "Memory at <ignored> (64-bit, prefetchable)" lines, this
> is because of presumably a firmware bug where it doesn't correctly map
> the secondary graphics card in the address space.
> 
> On my main installation, I boot with pci=realloc which fixes it, but due
> to the filesystem corruption risk, I am performing this testing on a
> separate installation with the rest of the drives unmounted, and I
> forgot to add pci=realloc. The behaviour in regards to the power
> management issue is the same in both cases (it was originally discovered
> with pci=realloc set).
> 

 This  “Completion-Wait loop timed out with vfio” prints is coming
 from the IOMMU driver. Can you please check once by adding ‘pci=realloc’
 in your separate installation and see if we the memory are enabled after
 D3hot cycles. If memory is getting disabled only after D3hot cycles with
 ‘pci=realloc’, then we need to find out at which stage it is happening
 (when the device is going into D3hot or when root port is going into D3hot).

 For this we can disable the runtime PM of both device and root port before
 binding the device to vfio-pci driver. Then enable runtime PM of device first
 and wait for it to go into suspended state. Then check lspci output. 
 Then enable the same for root port and check lspci output.

>> 5. Enable the ftrace events related with runtime power management before starting the VM
> I captured the trace, but
> $ wc -l trace
> 41129 trace
> 
> It doesn't sound like a good idea to send the contents of that entire
> file. Is there something specific you'd like me to filter for?
> 
> Perhaps this is a bit better?:
> $ grep "KVM\|qemu\|0000:06:00" trace | wc -l
> 719
> 
> If not, I can upload the entire file and send a link, although I don't
> know if it will be caught in a spam filter.
> 

 Mainly I was looking for the prints related with the device, root port
 and any other device present under the root port (like audio function of the GPU).
 But given these logs are huge and won’t help much here so you can skip this.

 Thanks,
 Abhishek

>>  6. Do you have any NVIDIA graphics card with you. If you have, then
>>     could you please check if issue happens with that.
> Unfortunately, no, I do not have any NVIDIA cards.
> 
> I tried passing through a:
> 0a:00.0 Serial controller [0700]: MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller [9710:9912] (prog-if 02 [16550])
> which claims:
> 	Capabilities: [78] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 
> and I did not get any messages/errors in dmesg.
> 
> Thank you
> 
> --
> Tasos
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux