On Thu, 23 Jan 2025 17:48:52 +0000 <ankita@xxxxxxxxxx> wrote: > From: Ankit Agrawal <ankita@xxxxxxxxxx> > > NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a > continuation with the Grace Hopper (GH) superchip that provides a > cache coherent access to CPU and GPU to each other's memory with > an internal proprietary chip-to-chip cache coherent interconnect. > > There is a HW defect on GH systems to support the Multi-Instance > GPU (MIG) feature [1] that necessiated the presence of a 1G region > with uncached mapping carved out from the device memory. The 1G > region is shown as a fake BAR (comprising region 2 and 3) to > workaround the issue. This is fixed on the GB systems. > > The presence of the fix for the HW defect is communicated by the > device firmware through the DVSEC PCI config register with ID 3. > The module reads this to take a different codepath on GB vs GH. > > Scan through the DVSEC registers to identify the correct one and use > it to determine the presence of the fix. Save the value in the device's > nvgrace_gpu_pci_core_device structure. > > Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1] > > CC: Jason Gunthorpe <jgg@xxxxxxxxxx> > CC: Kevin Tian <kevin.tian@xxxxxxxxx> > Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx> > --- > drivers/vfio/pci/nvgrace-gpu/main.c | 30 +++++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c > index a467085038f0..dde2daa597f8 100644 > --- a/drivers/vfio/pci/nvgrace-gpu/main.c > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c > @@ -23,6 +23,11 @@ > /* A hardwired and constant ABI value between the GPU FW and VFIO driver. */ > #define MEMBLK_SIZE SZ_512M > > +#define DVSEC_BITMAP_OFFSET 0xA > +#define MIG_SUPPORTED_WITH_CACHED_RESMEM BIT(0) > + > +#define GPU_CAP_DVSEC_REGISTER 3 > + > /* > * The state of the two device memory region - resmem and usemem - is > * saved as struct mem_region. > @@ -46,6 +51,7 @@ struct nvgrace_gpu_pci_core_device { > struct mem_region resmem; > /* Lock to control device memory kernel mapping */ > struct mutex remap_lock; > + bool has_mig_hw_bug; > }; > > static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev) > @@ -812,6 +818,26 @@ nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev, > return ret; > } > > +static bool nvgrace_gpu_has_mig_hw_bug(struct pci_dev *pdev) > +{ > + int pcie_dvsec; > + u16 dvsec_ctrl16; > + > + pcie_dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_NVIDIA, > + GPU_CAP_DVSEC_REGISTER); > + > + if (pcie_dvsec) { > + pci_read_config_word(pdev, > + pcie_dvsec + DVSEC_BITMAP_OFFSET, > + &dvsec_ctrl16); > + > + if (dvsec_ctrl16 & MIG_SUPPORTED_WITH_CACHED_RESMEM) > + return false; > + } > + > + return true; > +} > + > static int nvgrace_gpu_probe(struct pci_dev *pdev, > const struct pci_device_id *id) > { > @@ -832,6 +858,8 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev, > dev_set_drvdata(&pdev->dev, &nvdev->core_device); > > if (ops == &nvgrace_gpu_pci_ops) { > + nvdev->has_mig_hw_bug = nvgrace_gpu_has_mig_hw_bug(pdev); > + > /* > * Device memory properties are identified in the host ACPI > * table. Set the nvgrace_gpu_pci_core_device structure. > @@ -868,6 +896,8 @@ static const struct pci_device_id nvgrace_gpu_vfio_pci_table[] = { > { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2345) }, > /* GH200 SKU */ > { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2348) }, > + /* GB200 SKU */ > + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2941) }, > {} > }; > GB support isn't really complete until patch 3, so shouldn't we hold off on adding the ID to the table until a trivial patch 4, adding only the chunk above? Thanks, Alex