Re: Bug: Completion-Wait loop timed out with vfio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Mar 2023 12:34:32 +0200
Tasos Sahanidis <tasos@xxxxxxxxxxxx> wrote:

> On 2023-02-28 20:46, Alex Williamson wrote:
> > Can you do the same for the root port to the GPU, ex. use lspci -t to
> > find the parent root port.  Since the device doesn't seem to be
> > achieving D3cold (expected on a desktop system), the other significant
> > change of the identified commit is that the root port will also enter a
> > low power state.  Prior to that commit the device would enter D3hot, but
> > we never touched the root port.  Perhaps confirm the root port now
> > enters D3hot and compare lspci for the root port when using
> > disable_idle_d3 to that found when trying to use the device without
> > disable_idle_d3. Thanks,
> > 
> > Alex
> >   
> 
> I seem to have trouble understanding the lspci tree.
> 
> The tree is as follows:
> 
> -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
> [...]      |
>            +-01.2-[02-0d]----00.0-[03-0d]--+-01.0-[04-05]----00.0-[05]--+-00.0  Creative Labs EMU10k2/CA0100/CA0102/CA10200 [Sound Blaster Audigy Series]
>            |                               |                            +-00.1  Creative Labs SB Audigy Game Port
>            |                               |                            +-01.0  Brooktree Corporation Bt878 Video Capture
>            |                               |                            \-01.1  Brooktree Corporation Bt878 Audio Capture
>            |                               +-02.0-[06]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM]
>            |                               |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]
>            |                               +-03.0-[07-08]----00.0-[08]--+-00.0  Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder
>            |                               |                            \-01.0  Yamaha Corporation YMF-744B [DS-1S Audio Controller]
>            |                               +-05.0-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>            |                               +-06.0-[0a]--+-00.0  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            +-00.1  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               |            \-00.2  MosChip Semiconductor Technology Ltd. PCIe 9912 Multi-I/O Controller
>            |                               +-08.0-[0b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
>            |                               |            +-00.1  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               |            \-00.3  Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>            |                               +-09.0-[0c]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
>            |                               \-0a.0-[0d]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
> [...]      |
> 
> The parent root port is either 0000:00:01.2 or 0000:00:02.0, correct?

The topology is a bit more complex than usual, the root port is indeed
0000:00:01.2, but we have a PCIe switch.

> 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
> 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream

0000:02:00.0 is the upstream port of that switch and 0000:03:02.0 is
the downstream port for the 7790.  0000:03:02.0 is the port that should
also now enter D3hot.

> If so, I tested in 5.18, both before and while running the VM, with 6.2
> both with and without disable_idle_d3, and in all cases they stayed at D0.

In this case the upstream port should always stay in D0, it has quite a
lot more devices under it than just the GPU.  It's interesting that the
MosChip that assigns ok is also under a downstream port of this switch.
That means the downstream port 0000:03:06.0 should also be entering
D3hot when all of the MosChip devices are attached to vfio-pci and
unused.

I'm not convinced thought that the MosChip assignment is a good
comparison device though, as a "multi-i/o" controller, it's possible
that it doesn't actually make use of DMA that would trigger the IOMMU
like the GPU does.  Do you have a NIC card you could replace one of
these with?

It's possible the switch has a problem with D3hot support and it may
need to be disabled or augmented with a PCI quirk.  In addition to
investigating what power state the downstream port is achieving and
reporting lspci -vvv with and without disable_idle_d3, would you mind
reporting "lspci -nns 2:00.0" and "lspci -nns 3:" to report all the
vendor and device IDs of the switch.  Thanks,

Alex




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux