Patch "vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem" has been added to the 6.13-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem

to the 6.13-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     vfio-nvgrace-gpu-read-dvsec-register-to-determine-ne.patch
and it can be found in the queue-6.13 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 0154831371ee04b01b5c60cfbdc44dce1ba2c82c
Author: Ankit Agrawal <ankita@xxxxxxxxxx>
Date:   Fri Jan 24 18:30:59 2025 +0000

    vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem
    
    [ Upstream commit bd53764a60ad586ad5b6ed339423ad5e67824464 ]
    
    NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a
    continuation with the Grace Hopper (GH) superchip that provides a
    cache coherent access to CPU and GPU to each other's memory with
    an internal proprietary chip-to-chip cache coherent interconnect.
    
    There is a HW defect on GH systems to support the Multi-Instance
    GPU (MIG) feature [1] that necessiated the presence of a 1G region
    with uncached mapping carved out from the device memory. The 1G
    region is shown as a fake BAR (comprising region 2 and 3) to
    workaround the issue. This is fixed on the GB systems.
    
    The presence of the fix for the HW defect is communicated by the
    device firmware through the DVSEC PCI config register with ID 3.
    The module reads this to take a different codepath on GB vs GH.
    
    Scan through the DVSEC registers to identify the correct one and use
    it to determine the presence of the fix. Save the value in the device's
    nvgrace_gpu_pci_core_device structure.
    
    Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
    
    CC: Jason Gunthorpe <jgg@xxxxxxxxxx>
    CC: Kevin Tian <kevin.tian@xxxxxxxxx>
    Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20250124183102.3976-2-ankita@xxxxxxxxxx
    Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index a467085038f0c..b76368958d1c5 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -23,6 +23,11 @@
 /* A hardwired and constant ABI value between the GPU FW and VFIO driver. */
 #define MEMBLK_SIZE SZ_512M
 
+#define DVSEC_BITMAP_OFFSET 0xA
+#define MIG_SUPPORTED_WITH_CACHED_RESMEM BIT(0)
+
+#define GPU_CAP_DVSEC_REGISTER 3
+
 /*
  * The state of the two device memory region - resmem and usemem - is
  * saved as struct mem_region.
@@ -46,6 +51,7 @@ struct nvgrace_gpu_pci_core_device {
 	struct mem_region resmem;
 	/* Lock to control device memory kernel mapping */
 	struct mutex remap_lock;
+	bool has_mig_hw_bug;
 };
 
 static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
@@ -812,6 +818,26 @@ nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev,
 	return ret;
 }
 
+static bool nvgrace_gpu_has_mig_hw_bug(struct pci_dev *pdev)
+{
+	int pcie_dvsec;
+	u16 dvsec_ctrl16;
+
+	pcie_dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_NVIDIA,
+					       GPU_CAP_DVSEC_REGISTER);
+
+	if (pcie_dvsec) {
+		pci_read_config_word(pdev,
+				     pcie_dvsec + DVSEC_BITMAP_OFFSET,
+				     &dvsec_ctrl16);
+
+		if (dvsec_ctrl16 & MIG_SUPPORTED_WITH_CACHED_RESMEM)
+			return false;
+	}
+
+	return true;
+}
+
 static int nvgrace_gpu_probe(struct pci_dev *pdev,
 			     const struct pci_device_id *id)
 {
@@ -832,6 +858,8 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev,
 	dev_set_drvdata(&pdev->dev, &nvdev->core_device);
 
 	if (ops == &nvgrace_gpu_pci_ops) {
+		nvdev->has_mig_hw_bug = nvgrace_gpu_has_mig_hw_bug(pdev);
+
 		/*
 		 * Device memory properties are identified in the host ACPI
 		 * table. Set the nvgrace_gpu_pci_core_device structure.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux