Re: [PATCH 2/2] PCI: Create device link for NVIDIA GPU

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Fri, 31 May 2019 15:39:08 -0500

[+cc Lukas, author of 07f4f97d7b4b ("vga_switcheroo: Use device link
for HDA controller")]

On Fri, May 31, 2019 at 10:31:09AM +0530, Abhishek Sahu wrote:
> NVIDIA Turing GPUs include hardware support for USB Type-C and
> VirtualLink. It helps in delivering the power, display, and data
> required to power VR headsets through a single USB Type-C connector.
> The Turing GPU is a multi-function PCI device has the following
> four functions:
> 
> 	- VGA display controller (Function 0)
> 	- Audio controller (Function 1)
> 	- USB xHCI Host controller (Function 2)
> 	- USB Type-C USCI controller (Function 3)
> 
> The function 0 is tightly coupled with other functions in the
> hardware. When function 0 goes in runtime suspended state,

"Runtime suspended" is a Linux concept, not a PCI concept.  Please
replace this with the appropriate PCI term, e.g., "D3hot" or whatever
it is.

> then it will do power gating for most of the hardware blocks.
> Some of these hardware blocks are used by other functions which
> leads to functional failure. So if any of these functions (1/2/3)
> are active, then function 0 should also be in active state.

Instead of "active" and "active state", please use the specific states
required in terms of PCI.

> 'commit 07f4f97d7b4b ("vga_switcheroo: Use device link for
> HDA controller")' creates the device link from function 1 to
> function 0. A similar kind of device link needs to be created
> between function 0 and functions 2 and 3 for NVIDIA Turing GPU.

I can't point to language that addresses this, but this sounds like a
case of the GPU not conforming to the PCI spec.  The general
assumption is that the OS should be able to discover everything it
needs to do power management directly from the architected PCI config
space.

It is definitely not ideal to have to add quirks like this for devices
designed this way.  Such quirks force us to do otherwise unnecessary
OS updates as new devices are released.

If all the devices in a multi-function device were connected
intimately enough that they all had to be managed by the same driver,
I could imagine putting these non-discoverable dependencies in the
driver.  But these devices don't seem to be related in that way.

If there *is* spec language that allows dependencies like this, please
include the citation in your commit log.

> This patch does the same and create the required device links. It
> will make function 0 to be runtime PM active if other functions
> are still active.
> 
> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
> ---
>  drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index a20f7771a323..afdbc199efc5 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4967,6 +4967,29 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID,
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>  			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
>  
> +/* Create device link for NVIDIA GPU with integrated USB controller to VGA. */
> +static void quirk_gpu_usb(struct pci_dev *usb)
> +{
> +	pci_create_device_link_with_vga(usb, 2);
> +}
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> +			      PCI_CLASS_SERIAL_USB, 8, quirk_gpu_usb);
> +
> +/*
> + * Create device link for NVIDIA GPU with integrated Type-C UCSI controller
> + * to VGA. Currently there is no class code defined for UCSI device over PCI
> + * so using UNKNOWN class for now and it will be updated when UCSI
> + * over PCI gets a class code.

Ugh.  Here's a good example of having to do yet another OS update.

> + */
> +#define PCI_CLASS_SERIAL_UNKNOWN	0x0c80
> +static void quirk_gpu_usb_typec_ucsi(struct pci_dev *ucsi)
> +{
> +	pci_create_device_link_with_vga(ucsi, 3);
> +}
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> +			      PCI_CLASS_SERIAL_UNKNOWN, 8,
> +			      quirk_gpu_usb_typec_ucsi);
> +
>  /*
>   * Some IDT switches incorrectly flag an ACS Source Validation error on
>   * completions for config read requests even though PCIe r4.0, sec
> -- 
> 2.17.1
>