Hi libvirt experts, I am starting this email thread to discuss the potential solution / proposal of integrating vGPU support into libvirt for QEMU. Some quick background, NVIDIA is implementing a VFIO based mediated device framework to allow people to virtualize their devices without SR-IOV, for example NVIDIA vGPU, and Intel KVMGT. Within this framework, we are reusing the VFIO API to process the memory / interrupt as what QEMU does today with passthru device. The difference here is that we are introducing a set of new sysfs file for virtual device discovery and life cycle management due to its virtual nature. Here is the summary of the sysfs, when they will be created and how they should be used: 1. Discover mediated device As part of physical device initialization process, vendor driver will register their physical devices, which will be used to create virtual device (mediated device, aka mdev) to the mediated framework. Then, the sysfs file "mdev_supported_types" will be available under the physical device sysfs, and it will indicate the supported mdev and configuration for this particular physical device, and the content may change dynamically based on the system's current configurations, so libvirt needs to query this file every time before create a mdev. Note: different vendors might have their own specific configuration sysfs as well, if they don't have pre-defined types. For example, we have a NVIDIA Tesla M60 on 86:00.0 here registered, and here is NVIDIA specific configuration on an idle system. For example, to query the "mdev_supported_types" on this Tesla M60: cat /sys/bus/pci/devices/0000:86:00.0/mdev_supported_types # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer, max_resolution 11 ,"GRID M60-0B", 16, 2, 45, 512M, 2560x1600 12 ,"GRID M60-0Q", 16, 2, 60, 512M, 2560x1600 13 ,"GRID M60-1B", 8, 2, 45, 1024M, 2560x1600 14 ,"GRID M60-1Q", 8, 2, 60, 1024M, 2560x1600 15 ,"GRID M60-2B", 4, 2, 45, 2048M, 2560x1600 16 ,"GRID M60-2Q", 4, 4, 60, 2048M, 2560x1600 17 ,"GRID M60-4Q", 2, 4, 60, 4096M, 3840x2160 18 ,"GRID M60-8Q", 1, 4, 60, 8192M, 3840x2160 2. Create/destroy mediated device Two sysfs files are available under the physical device sysfs path : mdev_create and mdev_destroy The syntax of creating a mdev is: echo "$mdev_UUID:vendor_specific_argument_list" > /sys/bus/pci/devices/.../mdev_create The syntax of destroying a mdev is: echo "$mdev_UUID:vendor_specific_argument_list" > /sys/bus/pci/devices/.../mdev_destroy The $mdev_UUID is a unique identifier for this mdev device to be created, and it is unique per system. For NVIDIA vGPU, we require a vGPU type identifier (shown as vgpu_type_id in above Tesla M60 output), and a VM UUID to be passed as "vendor_specific_argument_list". If there is no vendor specific arguments required, either "$mdev_UUID" or "$mdev_UUID:" will be acceptable as input syntax for the above two commands. To create a M60-4Q device, libvirt needs to do: echo "$mdev_UUID:vgpu_type_id=20,vm_uuid=$VM_UUID" > /sys/bus/pci/devices/0000\:86\:00.0/mdev_create Then, you will see a virtual device shows up at: /sys/bus/mdev/devices/$mdev_UUID/ For NVIDIA, to create multiple virtual devices per VM, it has to be created upfront before bringing any of them online. Regarding error reporting and detection, on failure, write() to sysfs using fd returns error code, and write to sysfs file through command prompt shows the string corresponding to error code. 3. Start/stop mediated device Under the virtual device sysfs, you will see a new "online" sysfs file. you can do cat /sys/bus/mdev/devices/$mdev_UUID/online to get the current status of this virtual device (0 or 1), and to start a virtual device or stop a virtual device you can do: echo "1|0" > /sys/bus/mdev/devices/$mdev_UUID/online libvirt needs to query the current state before changing state. Note: if you have multiple devices, you need to write to the "online" file individually. For NVIDIA, if there are multiple mdev per VM, libvirt needs to bring all of them "online" before starting QEMU. 4. Launch QEMU/VM Pass the mdev sysfs path to QEMU as vfio-pci device: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$mdev_UUID,id=vgpu0 5. Shutdown sequence libvirt needs to shutdown the qemu, bring the virtual device offline, then destroy the virtual device 6. VM Reset No change or requirement for libvirt as this will be handled via VFIO reset API and QEMU process will keep running as before. 7. Hot-plug It optional for vendors to support hot-plug. And it is same syntax to create a virtual device for hot-plug. For hot-unplug, after executing QEMU monitor "device del" command, libvirt needs to write to "destroy" sysfs to complete hot-unplug process. Since hot-plug is optional, then mdev_create or mdev_destroy operations may return an error if it is not supported. Thanks, Neo -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list