On 10/13/2016 3:14 AM, Alex Williamson wrote: > On Thu, 13 Oct 2016 00:32:48 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >> On 10/12/2016 9:29 PM, Alex Williamson wrote: >>> On Wed, 12 Oct 2016 20:43:48 +0530 >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>> >>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote: >>>>>> From: Kirti Wankhede [mailto:kwankhede@xxxxxxxxxx] >>>>>> Sent: Wednesday, October 12, 2016 4:45 AM >>>>>>>> +* mdev_supported_types: >>>>>>>> + List of current supported mediated device types and its details are added >>>>>>>> +in this directory in following format: >>>>>>>> + >>>>>>>> +|- <parent phy device> >>>>>>>> +|--- Vendor-specific-attributes [optional] >>>>>>>> +|--- mdev_supported_types >>>>>>>> +| |--- <type id> >>>>>>>> +| | |--- create >>>>>>>> +| | |--- name >>>>>>>> +| | |--- available_instances >>>>>>>> +| | |--- description /class >>>>>>>> +| | |--- [devices] >>>>>>>> +| |--- <type id> >>>>>>>> +| | |--- create >>>>>>>> +| | |--- name >>>>>>>> +| | |--- available_instances >>>>>>>> +| | |--- description /class >>>>>>>> +| | |--- [devices] >>>>>>>> +| |--- <type id> >>>>>>>> +| |--- create >>>>>>>> +| |--- name >>>>>>>> +| |--- available_instances >>>>>>>> +| |--- description /class >>>>>>>> +| |--- [devices] >>>>>>>> + >>>>>>>> +[TBD : description or class is yet to be decided. This will change.] >>>>>>> >>>>>>> I thought that in previous discussions we had agreed to drop >>>>>>> the <type id> concept and use the name as the unique identifier. >>>>>>> When reporting these types in libvirt we won't want to report >>>>>>> the type id values - we'll want the name strings to be unique. >>>>>>> >>>>>> >>>>>> The 'name' might not be unique but type_id will be. For example that Neo >>>>>> pointed out in earlier discussion, virtual devices can come from two >>>>>> different physical devices, end user would be presented with what they >>>>>> had selected but there will be internal implementation differences. In >>>>>> that case 'type_id' will be unique. >>>>>> >>>>> >>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type >>>>> string (if you still called it <type id>), and then no need of additional >>>>> 'name' field which can be put inside 'description' field. See below quote: >>>>> >>>> >>>> We had internal discussions about this within NVIDIA and found that >>>> 'name' might not be unique where as 'type_id' would be unique. I'm >>>> refering to Neo's mail after that, where Neo do pointed that out. >>>> >>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html >>> >>> Everyone not privy to those internal discussions, including me, seems to >>> think we dropped type_id and that if a vendor does not have a stable >>> name, they can compose some sort of stable type description based on the >>> name+id, or even vendor+id, ex. NVIDIA-11. So please share why we >>> haven't managed to kill off type_id yet. No matter what internal >>> representation each vendor driver has of "type_id" it seems possible >>> for it to come up with stable string to define a given configuration. >> >> >> The 'type_id' is unique and the 'name' are not, the name is just a >> virtual device name/ human readable name. Because at this moment Intel >> can't define a proper GPU class, we have to add a 'description' field >> there as well to represent the features of this virtual device, once we >> have all agreed with the GPU class and its mandatory attributes, the >> 'description' field can be removed. Here is an example, >> type_id/type_name = NVIDIA_11, >> name=M60-M0Q, >> description=2560x1600, 2 displays, 512MB" >> >> Neo's previous comment only applies to the situation where we will have >> the GPU class or optional attributes defined and recognized by libvirt, >> since that is not going to happen any time soon, we will have to have >> the new 'description' field, and we don't want to have it mixed up with >> 'name' field. >> >> We can definitely have something like name+id as Alex recommended to >> remove the 'name' field, but it will just require libvirt to have more >> logic to parse that string. > > Let's use the mtty example driver provided in patch 5 so we can all > more clearly see how the interfaces work. I'll start from the > beginning of my experience and work my way to the type/name thing. > Thanks for looking into it and getting feel of it. And I hope this helps to understand that 'name' and 'type_id' are different. > (please add a modules_install target to the Makefile) > This is an example and I feel it should not be installed in /lib/modules/../build path. This should be used to understand the interface and the flow of mdev device management life cycle. User can use insmod to load driver: # insmod mtty.ko > # modprobe mtty > > Now what? It seems like I need to have prior knowledge that this > drivers supports mdev devices and I need to go hunt for them. We need > to create a class (ex. /sys/class/mdev/) where a user can find all the > devices that participate in this mediated device infrastructure. That > would point me to /sys/devices/mtty. > You can find devices registered to mdev framework by searching for 'mdev_supported_types' directory at the leaf nodes of devices in /sys/devices directory. Yes, we can have 'mdev' class and links to devices which are registered to mdev framework in /sys/class/mdev/. > # tree /sys/devices/mtty > /sys/devices/mtty > |-- mdev_supported_types > | `-- mtty1 > | |-- available_instances (1) > | |-- create > | |-- devices > | `-- name ("Dual-port-serial") > |-- mtty_dev > | `-- sample_mtty_dev ("This is phy device") > |-- power > | |-- async > | |-- autosuspend_delay_ms > | |-- control > | |-- runtime_active_kids > | |-- runtime_active_time > | |-- runtime_enabled > | |-- runtime_status > | |-- runtime_suspended_time > | `-- runtime_usage > `-- uevent > > Ok, but that was boring, we really need to have at least 2 supported > types to validate the interface, so without changing the actual device > backing, I pretended to have a single port vs dual port: > > /sys/devices/mtty > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (24) > | | |-- create > | | |-- devices > | | `-- name (Single-port-serial) > | `-- mtty2 > | |-- available_instances (12) > | |-- create > | |-- devices > | `-- name (Dual-port-serial) > [snip] > > I arbitrarily decides I have 24 ports and each single port uses 1 port > and each dual port uses 2 ports. > > Before I start creating devices, what are we going to key the libvirt > XML on? Can we do anything to prevent vendors from colliding or do we > have any way to suggest meaningful and unique type_ids? Libvirt would have parent and type_id in XML. No two vendors can own same parent device. So I don't think vendors would collide even having same type_id, since <parent, type_id> pair would always be unique. Presumably if > we had a PCI device hosting this, we would be rooted at that parent > device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe > the type_id should automatically be prefixed by the vendor module name, > ex. mtty-1, i915-foo, nvidia-bar. There's something missing for > deterministically creating a "XYZ" device and knowing exactly what that > means and finding a parent device that supports it. > We can prefix type_id with module name, i.e using dev->driver->name, but <parent, type_id> pair is unique so I don't see much benefit in doing that. > Let's get to mdev creating... > > # uuidgen > mdev_supported_types/mtty2/create > # tree /sys/devices/mtty > /sys/devices/mtty > |-- e68189be-700e-41f7-93a3-b5351e79c470 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/63 > | |-- mtty2 -> ../mdev_supported_types/mtty2 > | |-- power > | | |-- async > | | |-- autosuspend_delay_ms > | | |-- control > | | |-- runtime_active_kids > | | |-- runtime_active_time > | | |-- runtime_enabled > | | |-- runtime_status > | | |-- runtime_suspended_time > | | `-- runtime_usage > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (22) > | | |-- create > | | |-- devices > | | `-- name > | `-- mtty2 > | |-- available_instances (11) > | |-- create > | |-- devices > | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 > | `-- name > > The mdev device was created directly under the parent, which seems like > it's going to get messy to me (ie. imagine dropping a bunch of uuids > into a PCI parent device's sysfs directory, how does a user know what > they are?). > That is the way devices are placed in sysfs. For example below devices: 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 1a (rev 07) 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07) 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 0 (rev 07) 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 1 (rev 07) 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 2 (rev 07) 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 3 (rev 07) 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 4 (rev 07) 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 5 (rev 07) 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 6 (rev 07) 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 7 (rev 07) 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07) 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07) 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07) In sysfs, those are in same parent folder of its parent root port: # ls /sys/devices/pci0000\:80/ -l total 0 drwxr-xr-x 8 root root 0 Oct 13 12:08 0000:80:01.0 drwxr-xr-x 7 root root 0 Oct 13 12:08 0000:80:02.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:03.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.1 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.2 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.3 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.4 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.5 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.6 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.7 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.0 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.2 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.4 lrwxrwxrwx 1 root root 0 Oct 13 13:25 firmware_node -> ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01 drwxr-xr-x 3 root root 0 Oct 13 12:08 pci_bus drwxr-xr-x 2 root root 0 Oct 13 12:08 power -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent > Under the device we have "mtty2", shouldn't that be > "mdev_supported_type", which then links to mtty2? Otherwise a user > needs to decode from the link what this attribute is. > I thought it should show type, so that by looking at 'ls' output user should be able to find type_id. > Also here's an example of those vendor sysfs entries per device. So > long as the vendor never expects a tool like libvirt to manipulate > attributes there, I can see how that could be pretty powerful. > Yes, it is good to have vendor specific entries, libvirt might not report/use it. That would be more useful for system admin to get extra information manually that libvirt doesn't report. > Moving down to the mdev_supported_types, I've updated mtty so that it > actually adjusts available instance, and we can now see a link under > the devices for mtty2. > > Also worth noting is that a link for the device appears > in /sys/bus/mdev/devices. > > BTW, specifying this device for QEMU vfio-pci is where the sysfsdev > option comes into play: > > -device > vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 > > Which raises another question, we can tell through the vfio interfaces > that this is exposes as a PCI device, by creating a container > (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), > adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), > getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally > getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the > flag bit that says the API is PCI. That's a long path to go and has > stumbling blocks like the type of iommu that's available for the given > platform. How do we make that manageable? Do you want device type to be expressed in sysfs? Then that should be done from vendor driver. vfio_mdev module is now a shim layer, so mdev core or vfio_mdev module don't know what device type flag vendor driver had set. Thanks, Kirti > I don't think we want to > create some artificial relationship that the type of the parent > necessarily match the type of the child mdev, we've already broken that > with a simple mdev tty driver. > > One more: > > # uuidgen > mdev_supported_types/mtty1/create > # tree /sys/devices/mtty > /sys/devices/mtty > |-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/64 > | |-- mtty1 -> ../mdev_supported_types/mtty1 > | |-- power > [snip] > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV a7ae17d1-2de4-44c2-ae58-20ae0a0befe8") > |-- e68189be-700e-41f7-93a3-b5351e79c470 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/63 > | |-- mtty2 -> ../mdev_supported_types/mtty2 > | |-- power > [snip] > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (21) > | | |-- create > | | |-- devices > | | | `-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 -> ../../../a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 > | | `-- name > | `-- mtty2 > | |-- available_instances (10) > | |-- create > | |-- devices > | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 > | `-- name > > Hopefully as expected with the caveats for the first example. > > # echo 1 > a7ae17d1-2de4-44c2-ae58-20ae0a0befe8/remove > # echo 1 > e68189be-700e-41f7-93a3-b5351e79c470/remove > > These do what they're supposed to, the devices are gone. > > Ok, I've identified some issues, let's figure out how to resolve them. > Thanks, > > Alex > > (hack multi-port mtty patch attached) > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html