On 10/13/2016 8:06 PM, Alex Williamson wrote: > On Thu, 13 Oct 2016 14:52:09 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >> On 10/13/2016 3:14 AM, Alex Williamson wrote: >>> On Thu, 13 Oct 2016 00:32:48 +0530 >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>> >>>> On 10/12/2016 9:29 PM, Alex Williamson wrote: >>>>> On Wed, 12 Oct 2016 20:43:48 +0530 >>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>>>> >>>>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote: >>>>>>>> From: Kirti Wankhede [mailto:kwankhede@xxxxxxxxxx] >>>>>>>> Sent: Wednesday, October 12, 2016 4:45 AM >>>>>>>>>> +* mdev_supported_types: >>>>>>>>>> + List of current supported mediated device types and its details are added >>>>>>>>>> +in this directory in following format: >>>>>>>>>> + >>>>>>>>>> +|- <parent phy device> >>>>>>>>>> +|--- Vendor-specific-attributes [optional] >>>>>>>>>> +|--- mdev_supported_types >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| | |--- create >>>>>>>>>> +| | |--- name >>>>>>>>>> +| | |--- available_instances >>>>>>>>>> +| | |--- description /class >>>>>>>>>> +| | |--- [devices] >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| | |--- create >>>>>>>>>> +| | |--- name >>>>>>>>>> +| | |--- available_instances >>>>>>>>>> +| | |--- description /class >>>>>>>>>> +| | |--- [devices] >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| |--- create >>>>>>>>>> +| |--- name >>>>>>>>>> +| |--- available_instances >>>>>>>>>> +| |--- description /class >>>>>>>>>> +| |--- [devices] >>>>>>>>>> + >>>>>>>>>> +[TBD : description or class is yet to be decided. This will change.] >>>>>>>>> >>>>>>>>> I thought that in previous discussions we had agreed to drop >>>>>>>>> the <type id> concept and use the name as the unique identifier. >>>>>>>>> When reporting these types in libvirt we won't want to report >>>>>>>>> the type id values - we'll want the name strings to be unique. >>>>>>>>> >>>>>>>> >>>>>>>> The 'name' might not be unique but type_id will be. For example that Neo >>>>>>>> pointed out in earlier discussion, virtual devices can come from two >>>>>>>> different physical devices, end user would be presented with what they >>>>>>>> had selected but there will be internal implementation differences. In >>>>>>>> that case 'type_id' will be unique. >>>>>>>> >>>>>>> >>>>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type >>>>>>> string (if you still called it <type id>), and then no need of additional >>>>>>> 'name' field which can be put inside 'description' field. See below quote: >>>>>>> >>>>>> >>>>>> We had internal discussions about this within NVIDIA and found that >>>>>> 'name' might not be unique where as 'type_id' would be unique. I'm >>>>>> refering to Neo's mail after that, where Neo do pointed that out. >>>>>> >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html >>>>> >>>>> Everyone not privy to those internal discussions, including me, seems to >>>>> think we dropped type_id and that if a vendor does not have a stable >>>>> name, they can compose some sort of stable type description based on the >>>>> name+id, or even vendor+id, ex. NVIDIA-11. So please share why we >>>>> haven't managed to kill off type_id yet. No matter what internal >>>>> representation each vendor driver has of "type_id" it seems possible >>>>> for it to come up with stable string to define a given configuration. >>>> >>>> >>>> The 'type_id' is unique and the 'name' are not, the name is just a >>>> virtual device name/ human readable name. Because at this moment Intel >>>> can't define a proper GPU class, we have to add a 'description' field >>>> there as well to represent the features of this virtual device, once we >>>> have all agreed with the GPU class and its mandatory attributes, the >>>> 'description' field can be removed. Here is an example, >>>> type_id/type_name = NVIDIA_11, >>>> name=M60-M0Q, >>>> description=2560x1600, 2 displays, 512MB" >>>> >>>> Neo's previous comment only applies to the situation where we will have >>>> the GPU class or optional attributes defined and recognized by libvirt, >>>> since that is not going to happen any time soon, we will have to have >>>> the new 'description' field, and we don't want to have it mixed up with >>>> 'name' field. >>>> >>>> We can definitely have something like name+id as Alex recommended to >>>> remove the 'name' field, but it will just require libvirt to have more >>>> logic to parse that string. >>> >>> Let's use the mtty example driver provided in patch 5 so we can all >>> more clearly see how the interfaces work. I'll start from the >>> beginning of my experience and work my way to the type/name thing. >>> >> >> Thanks for looking into it and getting feel of it. And I hope this helps >> to understand that 'name' and 'type_id' are different. >> >> >>> (please add a modules_install target to the Makefile) >>> >> >> This is an example and I feel it should not be installed in >> /lib/modules/../build path. This should be used to understand the >> interface and the flow of mdev device management life cycle. User can >> use insmod to load driver: >> >> # insmod mtty.ko > > It's not built by default, that's sufficient. Providing a > modules_install target makes it more accessible for testing and allows > easier testing of module dependencies with modprobe. insmod does not > exercise the automatic module dependency loading. > >>> # modprobe mtty >>> >>> Now what? It seems like I need to have prior knowledge that this >>> drivers supports mdev devices and I need to go hunt for them. We need >>> to create a class (ex. /sys/class/mdev/) where a user can find all the >>> devices that participate in this mediated device infrastructure. That >>> would point me to /sys/devices/mtty. >>> >> >> You can find devices registered to mdev framework by searching for >> 'mdev_supported_types' directory at the leaf nodes of devices in >> /sys/devices directory. Yes, we can have 'mdev' class and links to >> devices which are registered to mdev framework in /sys/class/mdev/. >> >> >>> # tree /sys/devices/mtty >>> /sys/devices/mtty >>> |-- mdev_supported_types >>> | `-- mtty1 >>> | |-- available_instances (1) >>> | |-- create >>> | |-- devices >>> | `-- name ("Dual-port-serial") >>> |-- mtty_dev >>> | `-- sample_mtty_dev ("This is phy device") >>> |-- power >>> | |-- async >>> | |-- autosuspend_delay_ms >>> | |-- control >>> | |-- runtime_active_kids >>> | |-- runtime_active_time >>> | |-- runtime_enabled >>> | |-- runtime_status >>> | |-- runtime_suspended_time >>> | `-- runtime_usage >>> `-- uevent >>> >>> Ok, but that was boring, we really need to have at least 2 supported >>> types to validate the interface, so without changing the actual device >>> backing, I pretended to have a single port vs dual port: >>> >>> /sys/devices/mtty >>> |-- mdev_supported_types >>> | |-- mtty1 >>> | | |-- available_instances (24) >>> | | |-- create >>> | | |-- devices >>> | | `-- name (Single-port-serial) >>> | `-- mtty2 >>> | |-- available_instances (12) >>> | |-- create >>> | |-- devices >>> | `-- name (Dual-port-serial) >>> [snip] >>> >>> I arbitrarily decides I have 24 ports and each single port uses 1 port >>> and each dual port uses 2 ports. >>> >>> Before I start creating devices, what are we going to key the libvirt >>> XML on? Can we do anything to prevent vendors from colliding or do we >>> have any way to suggest meaningful and unique type_ids? >> >> Libvirt would have parent and type_id in XML. No two vendors can own >> same parent device. So I don't think vendors would collide even having >> same type_id, since <parent, type_id> pair would always be unique. > > > We have a goal of supporting migration with mdev devices, Intel has > already shown this is possible. Tying the XML representation of an > mdev device to a parent device is directly contradictory to that goal. > libvirt needs a token which is unique across vendor to be able to > instantiate an mdev device. <parent, type_id> is unacceptable. > Ok, I'll have dev->driver->name prefix for type_id from mdev core module. In vendor driver attribute functions, they should also use same format to identify the type. >> Presumably if >>> we had a PCI device hosting this, we would be rooted at that parent >>> device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe >>> the type_id should automatically be prefixed by the vendor module name, >>> ex. mtty-1, i915-foo, nvidia-bar. There's something missing for >>> deterministically creating a "XYZ" device and knowdev->driver->nameing exactly what that >>> means and finding a parent device that supports it. >>> >> >> We can prefix type_id with module name, i.e using , but >> <parent, type_id> pair is unique so I don't see much benefit in doing >> that. >> >> >>> Let's get to mdev creating... >>> >>> # uuidgen > mdev_supported_types/mtty2/create >>> # tree /sys/devices/mtty >>> /sys/devices/mtty >>> |-- e68189be-700e-41f7-93a3-b5351e79c470 >>> | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev >>> | |-- iommu_group -> ../../../kernel/iommu_groups/63 >>> | |-- mtty2 -> ../mdev_supported_types/mtty2 >>> | |-- power >>> | | |-- async >>> | | |-- autosuspend_delay_ms >>> | | |-- control >>> | | |-- runtime_active_kids >>> | | |-- runtime_active_time >>> | | |-- runtime_enabled >>> | | |-- runtime_status >>> | | |-- runtime_suspended_time >>> | | `-- runtime_usage >>> | |-- remove >>> | |-- subsystem -> ../../../bus/mdev >>> | |-- uevent >>> | `-- vendor >>> | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") >>> |-- mdev_supported_types >>> | |-- mtty1 >>> | | |-- available_instances (22) >>> | | |-- create >>> | | |-- devices >>> | | `-- name >>> | `-- mtty2 >>> | |-- available_instances (11) >>> | |-- create >>> | |-- devices >>> | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 >>> | `-- name >>> >>> The mdev device was created directly under the parent, which seems like >>> it's going to get messy to me (ie. imagine dropping a bunch of uuids >>> into a PCI parent device's sysfs directory, how does a user know what >>> they are?). >>> >> >> That is the way devices are placed in sysfs. For example below devices: >> >> 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 1a (rev 07) >> 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 2a (rev 07) >> 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 3a in PCI Express Mode (rev 07) >> 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 0 (rev 07) >> 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 1 (rev 07) >> 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 2 (rev 07) >> 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 3 (rev 07) >> 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 4 (rev 07) >> 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 5 (rev 07) >> 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 6 (rev 07) >> 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 7 (rev 07) >> 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address >> Map, VTd_Misc, System Management (rev 07) >> 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control >> Status and Global Errors (rev 07) >> 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07) >> >> In sysfs, those are in same parent folder of its parent root port: >> >> # ls /sys/devices/pci0000\:80/ -l >> total 0 >> drwxr-xr-x 8 root root 0 Oct 13 12:08 0000:80:01.0 >> drwxr-xr-x 7 root root 0 Oct 13 12:08 0000:80:02.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:03.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.1 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.2 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.3 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.4 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.5 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.6 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.7 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.0 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.2 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.4 >> lrwxrwxrwx 1 root root 0 Oct 13 13:25 firmware_node -> >> ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 pci_bus >> drwxr-xr-x 2 root root 0 Oct 13 12:08 power >> -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent > > I agree, that's also messy, but a PCI device has a standard PCI address > format, so just by looking at those you know it's a child PCI device. > We can write code that understands how to parse that and understands > what it is by the address. On the other hand we're dropping uuids into > a directory. We can write code that understands how to parse it, but > how do we know that it's a child device vs some other attribute for the > parent? > mdev child will have 'mdev_supported_type' link in its directory. That should be sufficient to identify it's child and not other attribute. >>> Under the device we have "mtty2", shouldn't that be >>> "mdev_supported_type", which then links to mtty2? Otherwise a user >>> needs to decode from the link what this attribute is. >>> >> >> I thought it should show type, so that by looking at 'ls' output user >> should be able to find type_id. > > The type_id should be shown by actually reading the link, not by the > link name itself, the same way that the iommu_group link for a device > isn't the group number, it links to the group number but uses a > standard link name. > Ok. I'll rename the link name to 'mdev_supported_type' >>> Also here's an example of those vendor sysfs entries per device. So >>> long as the vendor never expects a tool like libvirt to manipulate >>> attributes there, I can see how that could be pretty powerful. >>> >> >> Yes, it is good to have vendor specific entries, libvirt might not >> report/use it. That would be more useful for system admin to get extra >> information manually that libvirt doesn't report. >> >> >>> Moving down to the mdev_supported_types, I've updated mtty so that it >>> actually adjusts available instance, and we can now see a link under >>> the devices for mtty2. >>> >>> Also worth noting is that a link for the device appears >>> in /sys/bus/mdev/devices. >>> >>> BTW, specifying this device for QEMU vfio-pci is where the sysfsdev >>> option comes into play: >>> >>> -device >>> vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 >>> >>> Which raises another question, we can tell through the vfio interfaces >>> that this is exposes as a PCI device, by creating a container >>> (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), >>> adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), >>> getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally >>> getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the >>> flag bit that says the API is PCI. That's a long path to go and has >>> stumbling blocks like the type of iommu that's available for the given >>> platform. How do we make that manageable? >> >> Do you want device type to be expressed in sysfs? Then that should be >> done from vendor driver. vfio_mdev module is now a shim layer, so mdev >> core or vfio_mdev module don't know what device type flag vendor driver >> had set. > > Right, the vendor driver would need to expose this, the mdev layers are > device agnostic, they don't know or care which device API is being > exposed. The other question is whether it needs to be part of the > initial implementation or can we assume pci for now and add something > later. I guess we already have our proof to the contrary with the > IBM ccw device that libvirt can't simply assume pci. I see that many > devices in sysfs have a subsystem link, which seems rather appropriate, > but we're not creating a real pci device, so linking to /sys/bus/pci > or /sys/class/pci_bus both seem invalid. Is that a dead end? We could > always expose vfio_device_info.flags, but that seems pretty ugly as > well, plus the sysfs mdev interface is not vfio specific. What if we > had a "device_api" attribute which the vendor driver would show as > "vfio-pci"? Therefore the mdev interface is not tied to vfio, but we > clearly show that a given type_id clearly exports a vfio-pci > interface. Thanks, > We can't use subsystem for mdev device. Kernel's device core framework adds subsystem link to mdev device folder as: subsystem -> ../../../../../../../bus/mdev We will have an mandatory attribute in 'supported_type_groups' which should show "vfio-pci" or "vfio-platform" based on the device flag vendor driver is going to set for that type. Thanks, Kirti -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html