Hi folks: Supporting multiple vGPUs is one of the goals of the next version of the RFC of NVIDIA vGPU support. Requesting the a larger GSP heap size is the first step. However the pre-scrubbed FB memory size on Ada is 256MB. Thus, using a larger GSP heap > 256MB requires an extra scrubber ucode image to scrub the FB memory before any other ucode images are executed. Thus, the scrubber ucode image support is required as a pre-condition for supporting the max vGPUs. I would like to start this RFC for discussions, collecting people's feedback before the next RFC of NVIDIA vGPU support. Besides, a kernel doc is attached to explain the story. This series should also addresses the comment [1] from Jason in the RFCv1 [2]. The series can also be found from a repo [3]. Tested on vGPU RFCv1 repo [2] and [3] with running Heaven for 3 hrs and Vulkan CTS without any problem. PATCH 1 - 2: Factor out some common routines for all the SKUs. PATCH 3: Load the scrubber ucode image when WPR2 heap size > 256MB PATCH 4: Execute the scrubber ucode image when the image firmware is loaded. PATCH 5 - 6: Set the WPR2 heap size to 576MB when vGPU(SRIOV) is supported. PATCH 7: Set the max supported vGPU count when SRIOV is supported. PATCH 8: Introduce a kernel doc. Generating the scrubber ucode image =================================== The following patch is required before generating the scrubber ucdoe image via open-gpu-kernel-modules[4]: diff --git a/nouveau/extract-firmware-nouveau.py b/nouveau/extract-firmware-nouveau.py index 837edc8d..6268934c 100755 --- a/nouveau/extract-firmware-nouveau.py +++ b/nouveau/extract-firmware-nouveau.py @@ -335,7 +335,7 @@ def main(): booter("ad102", "load", 384) booter("ad102", "unload", 384) bootloader("ad102", "_prod_") - # scrubber("ad102", 384) # Not currently used by Nouveau + scrubber("ad102", 384) # Not currently used by Nouveau if __name__ == "__main__": main() Once the script is patched, it will generate the scrubber ucode image binary. [1] https://lore.kernel.org/all/20241015163556.GN3394334@xxxxxxxxxx/ [2] https://lore.kernel.org/all/20240922124951.1946072-1-zhiw@xxxxxxxxxx/ [3] https://github.com/zhiwang-nvidia/linux/tree/zhi/scrubber-support [4] https://github.com/NVIDIA/open-gpu-kernel-modules/tree/535 Zhi Wang (8): drm/nouveau: factor out nvkm_gsp_init_fw_heap() drm/nouveau: introduce tu102_gsp_init_fw_heap() drm/nouveau: load scrubber ucode image when WPR2 heap size > 256MB drm/nouveau: scrub the FB memory when scrubber firmware is loaded drm/nouveau: support WPR2 heap size override drm/nouveau: override the WPR2 heap size when SRIOV is supported on Ada drm/nouveau: set max supported vGPU count when SRIOV is supported drm/nouveau: introduce the scrubber on Ada in a kernel doc .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 4 +- .../gpu/drm/nouveau/nvkm/subdev/gsp/ad102.c | 81 ++++++++++++++++++ .../gpu/drm/nouveau/nvkm/subdev/gsp/ga100.c | 1 + .../gpu/drm/nouveau/nvkm/subdev/gsp/ga102.c | 1 + .../gpu/drm/nouveau/nvkm/subdev/gsp/priv.h | 8 +- .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 85 ++++++++++++------- .../gpu/drm/nouveau/nvkm/subdev/gsp/tu102.c | 9 ++ .../gpu/drm/nouveau/nvkm/subdev/gsp/tu116.c | 1 + 8 files changed, 157 insertions(+), 33 deletions(-) -- 2.34.1