On Mon, 15 Aug 2016 15:09:30 -0700 Neo Jia <cjia@xxxxxxxxxx> wrote: > On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote: > > On Mon, 15 Aug 2016 09:38:52 +0000 > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > > > From: Kirti Wankhede [mailto:kwankhede@xxxxxxxxxx] > > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > > >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > >>> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to get > > > > >>> the parent_device so it can call the start and stop ops callbacks > > > > >>> respectively. That seems to imply that all of instances for a given > > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > > >>> still having a hard time buying into the uuid+instance plan when it > > > > >>> seems like each mdev_device should have an actual unique uuid. > > > > >>> Userspace tools can figure out which uuids to start for a given user, I > > > > >>> don't see much value in collecting them to instances within a uuid. > > > > >>> > > > > >> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > > > The instance number was never required in order to support multiple > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > > devices with that same UUID and therefore associate udev events to a > > > > > given VM. Only then does an instance number become necessary since the > > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > > like a very dodgy solution when we should probably just be querying > > > > > libvirt to give us a device to VM association. > > > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > > for mdev in the basic design. It's bound to NVIDIA management stack too tightly. > > > > > > I'm OK to give enough flexibility for various upper level management stacks, > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > > option where either UUID or STRING could be optional? Upper management > > > stack can choose its own policy to identify a mdev: > > > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > > > (vgpu0, vgpu1, etc.) > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > > a numeric index > > > > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > > > >> all instances of similar devices assigned to VM. > > > > >> > > > > >> For example, to create 2 devices: > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > > >> > > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > > >> > > > > >> Commit resources for above devices with single 'mdev_start': > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > > >> 'instance', so 'mdev_create' would look like: > > > > >> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > > >> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > > > >> would be vendor specific parameters. > > > > >> > > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > > >> > > > > >> Then 'mdev_start' would be: > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > > >> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > probably want to start/stop them individually. Actually, why is it > > > > > that we can't use the mediated device being opened and released to > > > > > automatically signal to the backend vendor driver to commit and release > > > > > resources? I don't fully understand why userspace needs this interface. > > > > > > There is a meaningful use of start/stop interface, as required in live > > > migration support. Such interface allows vendor driver to quiescent > > > mdev activity on source device before mdev hardware state is snapshot, > > > and then resume mdev activity on dest device after its state is recovered. > > > Intel has implemented experimental live migration support in KVMGT (soon > > > to release), based on above two interfaces (plus another two to get/set > > > mdev state). > > > > Ok, that's actually an interesting use case for start/stop. > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > > > > one shot to commit resources of all vGPUs assigned to a VM along with > > > > some common resources. > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > requirement? It's hard to understand such a requirement. > > > > Agree, I know NVIDIA isn't planning to support hotplug initially, but > > this seems like we're precluding hotplug from the design. I don't > > understand what's driving this one-shot requirement. > > Hi Alex, > > The requirement here is based on how our internal vGPU device model designed and > with this we are able to pre-allocate resources required for multiple virtual > devices within same domain. > > And I don't think this syntax will stop us from supporting hotplug at all. > > For example, you can always create a virtual mdev and then do > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > then use QEMU monitor to add the device for hotplug. Hi Neo, I'm still not understanding the advantage you get from the "one-shot" approach then if we can always add more mdevs by starting them later. Are the hotplug mdevs somehow less capable than the initial set of mdevs added in a single shot? If the initial set is allocated from the "same domain", does that give them some sort of hardware locality/resource benefit? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html