On Tue, Aug 16, 2016 at 05:58:54AM +0000, Tian, Kevin wrote: > > From: Neo Jia [mailto:cjia@xxxxxxxxxx] > > Sent: Tuesday, August 16, 2016 1:44 PM > > > > On Tue, Aug 16, 2016 at 04:52:30AM +0000, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:cjia@xxxxxxxxxx] > > > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > > > > > On Tue, Aug 16, 2016 at 03:50:44AM +0000, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:cjia@xxxxxxxxxx] > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +0000, Tian, Kevin wrote: > > > > > > > > From: Neo Jia [mailto:cjia@xxxxxxxxxx] > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM > > in > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM along with > > > > > > > > > > some common resources. > > > > > > > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become a per-mdev > > > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > > > > > echo "0/1" > > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > > > > > In many scenario the user space client may only want to talk to mdev > > > > > > > > > instance directly, w/o need to contact its parent device. Still take > > > > > > > > > live migration for example, I don't think Qemu wants to know parent > > > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to > > know > > > > > > > > parent device. you can just do > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > > > > > There are many similar examples... > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > My response above is to your question about using the global sysfs entry as you > > > > > > don't want to have the global path because > > > > > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev instances.". > > > > > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > > > instance, so any mdev attributes touched by Qemu should be put under > > > > > that node (e.g. start/stop for live migration usage as I explained earlier). > > > > > > > > Exactly, qemu is passed with the actual sysfs path. > > > > > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all. > > > > > > > > QEMU will take the sysfs path as input: > > > > > > > > -device > > > > > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > > > d=vgpu0 > > > > > > no need of passing "id=vgpu0" here. If necessary you can put id as an attribute > > > under sysfs mdev node: > > > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id > > > > I think we have moved away from the device index based on Alex's comment, so the > > device path will be: > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > as parameter, and then Qemu can access 'id' under that path. You > don't need to pass a separate 'id' field. That's my point. > > > > > > > > > > > > > > > As you are saying in live migration, QEMU needs to access "start" and "stop". Could > > you > > > > please share more details, such as how QEMU access the "start" and "stop" sysfs, > > > > when and what triggers that? > > > > > > > > > > A conceptual flow as below: > > > > > > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait for > > > in-flight DMA completed, etc.) > > > > > > echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start > > > > > > 2. Save mdev state: > > > > > > cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx > > > > > > 3. xxx will be part of the final VM image and copied to a new machine > > > > > > 4. Allocate/prepare mdev on the new machine for this VM > > > > > > 5. Restore mdev state: > > > > > > cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > > > (might be a different path name) > > > > > > 6. start mdev on the new parent device: > > > > > > echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start > > > > Thanks for the sequence, so based on above live migration, the access of "start/stop" > > are from other user space program not QEMU process. > > > > (Just to be clear, I am not saying that I won't consider your suggestion of > > accommodating the "start/stop" file from global to mdev node, but I do want to point > > out that keeping them inside global shouldn't impact your live migration sequence > > above.) > > > > come on... I just use above bash command to show the step. Qemu itself definitely > needs to open the sysfs file descriptor and then read/write the fd... If we use VFIO > API as example, it might be more obvious. :-) Hi Kevin, I am not just picking on this example. The only fd that QEMU will access is the VFIO device fd (other than those container/IOMMU type1 stuff), and there is no changes required for QEMU. Thanks, Neo > > Thanks > Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html