Re: [PATCH v1 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards

Alex Williamson <alex.williamson@xxxxxxxxxx> · Mon, 7 Oct 2024 08:19:13 -0600

On Sun, 6 Oct 2024 10:27:19 +0000
<ankita@xxxxxxxxxx> wrote:

> From: Ankit Agrawal <ankita@xxxxxxxxxx>
> 
> NVIDIA's recently introduced Grace Blackwell (GB) Superchip in
> continuation with the Grace Hopper (GH) superchip that provides a
> cache coherent access to CPU and GPU to each other's memory with
> an internal proprietary chip-to-chip (C2C) cache coherent interconnect.
> The in-tree nvgrace-gpu driver manages the GH devices. The intention
> is to extend the support to the new Grace Blackwell boards.

Where do we stand on QEMU enablement of GH, or the GB support here?
IIRC, the nvgrace-gpu variant driver was initially proposed with QEMU
being the means through which the community could make use of this
driver, but there seem to be a number of pieces missing for that
support.  Thanks,

Alex

> There is a HW defect on GH to support the Multi-Instance GPU (MIG)
> feature [1] that necessiated the presence of a 1G carved out from
> the device memory and mapped uncached. The 1G region is shown as a
> fake BAR (comprising region 2 and 3) to workaround the issue.
> 
> The GB systems differ from GH systems in the following aspects.
> 1. The aforementioned HW defect is fixed on GB systems.
> 2. There is a usable BAR1 (region 2 and 3) on GB systems for the
> GPUdirect RDMA feature [2].
> 
> This patch series accommodate those GB changes by showing the real
> physical device BAR1 (region2 and 3) to the VM instead of the fake
> one. This takes care of both the differences.
> 
> The presence of the fix for the HW defect is communicated by the
> firmware through a DVSEC PCI config register. The module reads
> this to take a different codepath on GB vs GH.
> 
> To improve system bootup time, HBM training is moved out of UEFI
> in GB system. Poll for the register indicating the training state.
> Also check the C2C link status if it is ready. Fail the probe if
> either fails.
> 
> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
> Link: https://docs.nvidia.com/cuda/gpudirect-rdma/ [2]
> 
> Applied over next-20241003.
> 
> Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
> 
> Ankit Agrawal (3):
>   vfio/nvgrace-gpu: Read dvsec register to determine need for uncached
>     resmem
>   vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
>   vfio/nvgrace-gpu: Check the HBM training and C2C link status
> 
>  drivers/vfio/pci/nvgrace-gpu/main.c | 115 ++++++++++++++++++++++++++--
>  1 file changed, 107 insertions(+), 8 deletions(-)
>