Re: [Qemu-devel] [PATCH v8 4/6] docs: Add Documentation for Mediated devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/13/2016 3:14 AM, Alex Williamson wrote:
> On Thu, 13 Oct 2016 00:32:48 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> 
>> On 10/12/2016 9:29 PM, Alex Williamson wrote:
>>> On Wed, 12 Oct 2016 20:43:48 +0530
>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>   
>>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote:  
>>>>>> From: Kirti Wankhede [mailto:kwankhede@xxxxxxxxxx]
>>>>>> Sent: Wednesday, October 12, 2016 4:45 AM    
>>>>>>>> +* mdev_supported_types:
>>>>>>>> +    List of current supported mediated device types and its details are added
>>>>>>>> +in this directory in following format:
>>>>>>>> +
>>>>>>>> +|- <parent phy device>
>>>>>>>> +|--- Vendor-specific-attributes [optional]
>>>>>>>> +|--- mdev_supported_types
>>>>>>>> +|     |--- <type id>
>>>>>>>> +|     |   |--- create
>>>>>>>> +|     |   |--- name
>>>>>>>> +|     |   |--- available_instances
>>>>>>>> +|     |   |--- description /class
>>>>>>>> +|     |   |--- [devices]
>>>>>>>> +|     |--- <type id>
>>>>>>>> +|     |   |--- create
>>>>>>>> +|     |   |--- name
>>>>>>>> +|     |   |--- available_instances
>>>>>>>> +|     |   |--- description /class
>>>>>>>> +|     |   |--- [devices]
>>>>>>>> +|     |--- <type id>
>>>>>>>> +|          |--- create
>>>>>>>> +|          |--- name
>>>>>>>> +|          |--- available_instances
>>>>>>>> +|          |--- description /class
>>>>>>>> +|          |--- [devices]
>>>>>>>> +
>>>>>>>> +[TBD : description or class is yet to be decided. This will change.]    
>>>>>>>
>>>>>>> I thought that in previous discussions we had agreed to drop
>>>>>>> the <type id> concept and use the name as the unique identifier.
>>>>>>> When reporting these types in libvirt we won't want to report
>>>>>>> the type id values - we'll want the name strings to be unique.
>>>>>>>    
>>>>>>
>>>>>> The 'name' might not be unique but type_id will be. For example that Neo
>>>>>> pointed out in earlier discussion, virtual devices can come from two
>>>>>> different physical devices, end user would be presented with what they
>>>>>> had selected but there will be internal implementation differences. In
>>>>>> that case 'type_id' will be unique.
>>>>>>    
>>>>>
>>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type
>>>>> string (if you still called it <type id>), and then no need of additional
>>>>> 'name' field which can be put inside 'description' field. See below quote:
>>>>>     
>>>>
>>>> We had internal discussions about this within NVIDIA and found that
>>>> 'name' might not be unique where as 'type_id' would be unique. I'm
>>>> refering to Neo's mail after that, where Neo do pointed that out.
>>>>
>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html  
>>>
>>> Everyone not privy to those internal discussions, including me, seems to
>>> think we dropped type_id and that if a vendor does not have a stable
>>> name, they can compose some sort of stable type description based on the
>>> name+id, or even vendor+id, ex. NVIDIA-11.  So please share why we
>>> haven't managed to kill off type_id yet.  No matter what internal
>>> representation each vendor driver has of "type_id" it seems possible
>>> for it to come up with stable string to define a given configuration.  
>>
>>
>> The 'type_id' is unique and the 'name' are not, the name is just a
>> virtual device name/ human readable name. Because at this moment Intel
>> can't define a proper GPU class, we have to add a 'description' field
>> there as well to represent the features of this virtual device, once we
>> have all agreed with the GPU class and its mandatory attributes, the
>> 'description' field can be removed. Here is an example,
>> type_id/type_name = NVIDIA_11,
>> name=M60-M0Q,
>> description=2560x1600, 2 displays, 512MB"
>>
>> Neo's previous comment only applies to the situation where we will have
>> the GPU class or optional attributes defined and recognized by libvirt,
>> since that is not going to happen any time soon, we will have to have
>> the new 'description' field, and we don't want to have it mixed up with
>> 'name' field.
>>
>> We can definitely have something like name+id as Alex recommended to
>> remove the 'name' field, but it will just require libvirt to have more
>> logic to parse that string.
> 
> Let's use the mtty example driver provided in patch 5 so we can all
> more clearly see how the interfaces work.  I'll start from the
> beginning of my experience and work my way to the type/name thing.
> 

Thanks for looking into it and getting feel of it. And I hope this helps
to understand that 'name' and 'type_id' are different.


> (please add a modules_install target to the Makefile)
>

This is an example and I feel it should not be installed in
/lib/modules/../build path. This should be used to understand the
interface and the flow of mdev device management life cycle. User can
use insmod to load driver:

# insmod mtty.ko

> # modprobe mtty
> 
> Now what?  It seems like I need to have prior knowledge that this
> drivers supports mdev devices and I need to go hunt for them.  We need
> to create a class (ex. /sys/class/mdev/) where a user can find all the
> devices that participate in this mediated device infrastructure.  That
> would point me to /sys/devices/mtty.
> 

You can find devices registered to mdev framework by searching for
'mdev_supported_types' directory at the leaf nodes of devices in
/sys/devices directory. Yes, we can have 'mdev' class and links to
devices which are registered to mdev framework in /sys/class/mdev/.


> # tree /sys/devices/mtty
> /sys/devices/mtty
> |-- mdev_supported_types
> |   `-- mtty1
> |       |-- available_instances (1)
> |       |-- create
> |       |-- devices
> |       `-- name ("Dual-port-serial")
> |-- mtty_dev
> |   `-- sample_mtty_dev ("This is phy device")
> |-- power
> |   |-- async
> |   |-- autosuspend_delay_ms
> |   |-- control
> |   |-- runtime_active_kids
> |   |-- runtime_active_time
> |   |-- runtime_enabled
> |   |-- runtime_status
> |   |-- runtime_suspended_time
> |   `-- runtime_usage
> `-- uevent
> 
> Ok, but that was boring, we really need to have at least 2 supported
> types to validate the interface, so without changing the actual device
> backing, I pretended to have a single port vs dual port:
> 
> /sys/devices/mtty
> |-- mdev_supported_types
> |   |-- mtty1
> |   |   |-- available_instances (24)
> |   |   |-- create
> |   |   |-- devices
> |   |   `-- name (Single-port-serial)
> |   `-- mtty2
> |       |-- available_instances (12)
> |       |-- create
> |       |-- devices
> |       `-- name (Dual-port-serial)
> [snip]
> 
> I arbitrarily decides I have 24 ports and each single port uses 1 port
> and each dual port uses 2 ports.
> 
> Before I start creating devices, what are we going to key the libvirt
> XML on?  Can we do anything to prevent vendors from colliding or do we
> have any way to suggest meaningful and unique type_ids? 

Libvirt would have parent and type_id in XML. No two vendors can own
same parent device. So I don't think vendors would collide even having
same type_id, since <parent, type_id> pair would always be unique.


 Presumably if
> we had a PCI device hosting this, we would be rooted at that parent
> device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0.  Maybe
> the type_id should automatically be prefixed by the vendor module name,
> ex. mtty-1, i915-foo, nvidia-bar.  There's something missing for
> deterministically creating a "XYZ" device and knowing exactly what that
> means and finding a parent device that supports it.
> 

We can prefix type_id with module name, i.e using dev->driver->name, but
<parent, type_id> pair is unique so I don't see much benefit in doing
that.


> Let's get to mdev creating...
> 
> # uuidgen > mdev_supported_types/mtty2/create
> # tree /sys/devices/mtty
> /sys/devices/mtty
> |-- e68189be-700e-41f7-93a3-b5351e79c470
> |   |-- driver -> ../../../bus/mdev/drivers/vfio_mdev
> |   |-- iommu_group -> ../../../kernel/iommu_groups/63
> |   |-- mtty2 -> ../mdev_supported_types/mtty2
> |   |-- power
> |   |   |-- async
> |   |   |-- autosuspend_delay_ms
> |   |   |-- control
> |   |   |-- runtime_active_kids
> |   |   |-- runtime_active_time
> |   |   |-- runtime_enabled
> |   |   |-- runtime_status
> |   |   |-- runtime_suspended_time
> |   |   `-- runtime_usage
> |   |-- remove
> |   |-- subsystem -> ../../../bus/mdev
> |   |-- uevent
> |   `-- vendor
> |       `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470")
> |-- mdev_supported_types
> |   |-- mtty1
> |   |   |-- available_instances (22)
> |   |   |-- create
> |   |   |-- devices
> |   |   `-- name
> |   `-- mtty2
> |       |-- available_instances (11)
> |       |-- create
> |       |-- devices
> |       |   `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470
> |       `-- name
> 
> The mdev device was created directly under the parent, which seems like
> it's going to get messy to me (ie. imagine dropping a bunch of uuids
> into a PCI parent device's sysfs directory, how does a user know what
> they are?).
> 

That is the way devices are placed in sysfs. For example below devices:

80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 1a (rev 07)
80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 2a (rev 07)
80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
Root Port 3a in PCI Express Mode (rev 07)
80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
0 (rev 07)
80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
1 (rev 07)
80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
2 (rev 07)
80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
3 (rev 07)
80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
4 (rev 07)
80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
5 (rev 07)
80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
6 (rev 07)
80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
7 (rev 07)
80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address
Map, VTd_Misc, System Management (rev 07)
80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control
Status and Global Errors (rev 07)
80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07)

In sysfs, those are in same parent folder of its parent root port:

# ls /sys/devices/pci0000\:80/ -l
total 0
drwxr-xr-x 8 root root    0 Oct 13 12:08 0000:80:01.0
drwxr-xr-x 7 root root    0 Oct 13 12:08 0000:80:02.0
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:03.0
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.0
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.1
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.2
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.3
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.4
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.5
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.6
drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.7
drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.0
drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.2
drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.4
lrwxrwxrwx 1 root root    0 Oct 13 13:25 firmware_node ->
../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01
drwxr-xr-x 3 root root    0 Oct 13 12:08 pci_bus
drwxr-xr-x 2 root root    0 Oct 13 12:08 power
-rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent


> Under the device we have "mtty2", shouldn't that be
> "mdev_supported_type", which then links to mtty2?  Otherwise a user
> needs to decode from the link what this attribute is.
> 

I thought it should show type, so that by looking at 'ls' output user
should be able to find type_id.

> Also here's an example of those vendor sysfs entries per device.  So
> long as the vendor never expects a tool like libvirt to manipulate
> attributes there, I can see how that could be pretty powerful.
> 

Yes, it is good to have vendor specific entries, libvirt might not
report/use it. That would be more useful for system admin to get extra
information manually that libvirt doesn't report.


> Moving down to the mdev_supported_types, I've updated mtty so that it
> actually adjusts available instance, and we can now see a link under
> the devices for mtty2.
> 
> Also worth noting is that a link for the device appears
> in /sys/bus/mdev/devices.
> 
> BTW, specifying this device for QEMU vfio-pci is where the sysfsdev
> option comes into play:
> 
> -device
> vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470
> 
> Which raises another question, we can tell through the vfio interfaces
> that this is exposes as a PCI device, by creating a container
> (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)),
> adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)),
> getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally
> getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the
> flag bit that says the API is PCI.  That's a long path to go and has
> stumbling blocks like the type of iommu that's available for the given
> platform.  How do we make that manageable? 

Do you want device type to be expressed in sysfs? Then that should be
done from vendor driver. vfio_mdev module is now a shim layer, so mdev
core or vfio_mdev module don't know what device type flag vendor driver
had set.

Thanks,
Kirti

> I don't think we want to
> create some artificial relationship that the type of the parent
> necessarily match the type of the child mdev, we've already broken that
> with a simple mdev tty driver.
> 
> One more:
> 
> # uuidgen > mdev_supported_types/mtty1/create
> # tree /sys/devices/mtty
> /sys/devices/mtty
> |-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8
> |   |-- driver -> ../../../bus/mdev/drivers/vfio_mdev
> |   |-- iommu_group -> ../../../kernel/iommu_groups/64
> |   |-- mtty1 -> ../mdev_supported_types/mtty1
> |   |-- power
> [snip]
> |   |-- remove
> |   |-- subsystem -> ../../../bus/mdev
> |   |-- uevent
> |   `-- vendor
> |       `-- sample_mdev_dev ("This is MDEV a7ae17d1-2de4-44c2-ae58-20ae0a0befe8")
> |-- e68189be-700e-41f7-93a3-b5351e79c470
> |   |-- driver -> ../../../bus/mdev/drivers/vfio_mdev
> |   |-- iommu_group -> ../../../kernel/iommu_groups/63
> |   |-- mtty2 -> ../mdev_supported_types/mtty2
> |   |-- power
> [snip]
> |   |-- remove
> |   |-- subsystem -> ../../../bus/mdev
> |   |-- uevent
> |   `-- vendor
> |       `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470")
> |-- mdev_supported_types
> |   |-- mtty1
> |   |   |-- available_instances (21)
> |   |   |-- create
> |   |   |-- devices
> |   |   |   `-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 -> ../../../a7ae17d1-2de4-44c2-ae58-20ae0a0befe8
> |   |   `-- name
> |   `-- mtty2
> |       |-- available_instances (10)
> |       |-- create
> |       |-- devices
> |       |   `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470
> |       `-- name
> 
> Hopefully as expected with the caveats for the first example.
> 
> # echo 1 > a7ae17d1-2de4-44c2-ae58-20ae0a0befe8/remove
> # echo 1 > e68189be-700e-41f7-93a3-b5351e79c470/remove
> 
> These do what they're supposed to, the devices are gone.
> 
> Ok, I've identified some issues, let's figure out how to resolve them.
> Thanks,
> 
> Alex
> 
> (hack multi-port mtty patch attached)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux