Re: [Qemu-devel] [PATCH v8 4/6] docs: Add Documentation for Mediated devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/13/2016 8:06 PM, Alex Williamson wrote:
> On Thu, 13 Oct 2016 14:52:09 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> 
>> On 10/13/2016 3:14 AM, Alex Williamson wrote:
>>> On Thu, 13 Oct 2016 00:32:48 +0530
>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>   
>>>> On 10/12/2016 9:29 PM, Alex Williamson wrote:  
>>>>> On Wed, 12 Oct 2016 20:43:48 +0530
>>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>>>     
>>>>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote:    
>>>>>>>> From: Kirti Wankhede [mailto:kwankhede@xxxxxxxxxx]
>>>>>>>> Sent: Wednesday, October 12, 2016 4:45 AM      
>>>>>>>>>> +* mdev_supported_types:
>>>>>>>>>> +    List of current supported mediated device types and its details are added
>>>>>>>>>> +in this directory in following format:
>>>>>>>>>> +
>>>>>>>>>> +|- <parent phy device>
>>>>>>>>>> +|--- Vendor-specific-attributes [optional]
>>>>>>>>>> +|--- mdev_supported_types
>>>>>>>>>> +|     |--- <type id>
>>>>>>>>>> +|     |   |--- create
>>>>>>>>>> +|     |   |--- name
>>>>>>>>>> +|     |   |--- available_instances
>>>>>>>>>> +|     |   |--- description /class
>>>>>>>>>> +|     |   |--- [devices]
>>>>>>>>>> +|     |--- <type id>
>>>>>>>>>> +|     |   |--- create
>>>>>>>>>> +|     |   |--- name
>>>>>>>>>> +|     |   |--- available_instances
>>>>>>>>>> +|     |   |--- description /class
>>>>>>>>>> +|     |   |--- [devices]
>>>>>>>>>> +|     |--- <type id>
>>>>>>>>>> +|          |--- create
>>>>>>>>>> +|          |--- name
>>>>>>>>>> +|          |--- available_instances
>>>>>>>>>> +|          |--- description /class
>>>>>>>>>> +|          |--- [devices]
>>>>>>>>>> +
>>>>>>>>>> +[TBD : description or class is yet to be decided. This will change.]      
>>>>>>>>>
>>>>>>>>> I thought that in previous discussions we had agreed to drop
>>>>>>>>> the <type id> concept and use the name as the unique identifier.
>>>>>>>>> When reporting these types in libvirt we won't want to report
>>>>>>>>> the type id values - we'll want the name strings to be unique.
>>>>>>>>>      
>>>>>>>>
>>>>>>>> The 'name' might not be unique but type_id will be. For example that Neo
>>>>>>>> pointed out in earlier discussion, virtual devices can come from two
>>>>>>>> different physical devices, end user would be presented with what they
>>>>>>>> had selected but there will be internal implementation differences. In
>>>>>>>> that case 'type_id' will be unique.
>>>>>>>>      
>>>>>>>
>>>>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type
>>>>>>> string (if you still called it <type id>), and then no need of additional
>>>>>>> 'name' field which can be put inside 'description' field. See below quote:
>>>>>>>       
>>>>>>
>>>>>> We had internal discussions about this within NVIDIA and found that
>>>>>> 'name' might not be unique where as 'type_id' would be unique. I'm
>>>>>> refering to Neo's mail after that, where Neo do pointed that out.
>>>>>>
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html    
>>>>>
>>>>> Everyone not privy to those internal discussions, including me, seems to
>>>>> think we dropped type_id and that if a vendor does not have a stable
>>>>> name, they can compose some sort of stable type description based on the
>>>>> name+id, or even vendor+id, ex. NVIDIA-11.  So please share why we
>>>>> haven't managed to kill off type_id yet.  No matter what internal
>>>>> representation each vendor driver has of "type_id" it seems possible
>>>>> for it to come up with stable string to define a given configuration.    
>>>>
>>>>
>>>> The 'type_id' is unique and the 'name' are not, the name is just a
>>>> virtual device name/ human readable name. Because at this moment Intel
>>>> can't define a proper GPU class, we have to add a 'description' field
>>>> there as well to represent the features of this virtual device, once we
>>>> have all agreed with the GPU class and its mandatory attributes, the
>>>> 'description' field can be removed. Here is an example,
>>>> type_id/type_name = NVIDIA_11,
>>>> name=M60-M0Q,
>>>> description=2560x1600, 2 displays, 512MB"
>>>>
>>>> Neo's previous comment only applies to the situation where we will have
>>>> the GPU class or optional attributes defined and recognized by libvirt,
>>>> since that is not going to happen any time soon, we will have to have
>>>> the new 'description' field, and we don't want to have it mixed up with
>>>> 'name' field.
>>>>
>>>> We can definitely have something like name+id as Alex recommended to
>>>> remove the 'name' field, but it will just require libvirt to have more
>>>> logic to parse that string.  
>>>
>>> Let's use the mtty example driver provided in patch 5 so we can all
>>> more clearly see how the interfaces work.  I'll start from the
>>> beginning of my experience and work my way to the type/name thing.
>>>   
>>
>> Thanks for looking into it and getting feel of it. And I hope this helps
>> to understand that 'name' and 'type_id' are different.
>>
>>
>>> (please add a modules_install target to the Makefile)
>>>  
>>
>> This is an example and I feel it should not be installed in
>> /lib/modules/../build path. This should be used to understand the
>> interface and the flow of mdev device management life cycle. User can
>> use insmod to load driver:
>>
>> # insmod mtty.ko
> 
> It's not built by default, that's sufficient.  Providing a
> modules_install target makes it more accessible for testing and allows
> easier testing of module dependencies with modprobe.  insmod does not
> exercise the automatic module dependency loading.
> 
>>> # modprobe mtty
>>>
>>> Now what?  It seems like I need to have prior knowledge that this
>>> drivers supports mdev devices and I need to go hunt for them.  We need
>>> to create a class (ex. /sys/class/mdev/) where a user can find all the
>>> devices that participate in this mediated device infrastructure.  That
>>> would point me to /sys/devices/mtty.
>>>   
>>
>> You can find devices registered to mdev framework by searching for
>> 'mdev_supported_types' directory at the leaf nodes of devices in
>> /sys/devices directory. Yes, we can have 'mdev' class and links to
>> devices which are registered to mdev framework in /sys/class/mdev/.
>>
>>
>>> # tree /sys/devices/mtty
>>> /sys/devices/mtty
>>> |-- mdev_supported_types
>>> |   `-- mtty1
>>> |       |-- available_instances (1)
>>> |       |-- create
>>> |       |-- devices
>>> |       `-- name ("Dual-port-serial")
>>> |-- mtty_dev
>>> |   `-- sample_mtty_dev ("This is phy device")
>>> |-- power
>>> |   |-- async
>>> |   |-- autosuspend_delay_ms
>>> |   |-- control
>>> |   |-- runtime_active_kids
>>> |   |-- runtime_active_time
>>> |   |-- runtime_enabled
>>> |   |-- runtime_status
>>> |   |-- runtime_suspended_time
>>> |   `-- runtime_usage
>>> `-- uevent
>>>
>>> Ok, but that was boring, we really need to have at least 2 supported
>>> types to validate the interface, so without changing the actual device
>>> backing, I pretended to have a single port vs dual port:
>>>
>>> /sys/devices/mtty
>>> |-- mdev_supported_types
>>> |   |-- mtty1
>>> |   |   |-- available_instances (24)
>>> |   |   |-- create
>>> |   |   |-- devices
>>> |   |   `-- name (Single-port-serial)
>>> |   `-- mtty2
>>> |       |-- available_instances (12)
>>> |       |-- create
>>> |       |-- devices
>>> |       `-- name (Dual-port-serial)
>>> [snip]
>>>
>>> I arbitrarily decides I have 24 ports and each single port uses 1 port
>>> and each dual port uses 2 ports.
>>>
>>> Before I start creating devices, what are we going to key the libvirt
>>> XML on?  Can we do anything to prevent vendors from colliding or do we
>>> have any way to suggest meaningful and unique type_ids?   
>>
>> Libvirt would have parent and type_id in XML. No two vendors can own
>> same parent device. So I don't think vendors would collide even having
>> same type_id, since <parent, type_id> pair would always be unique.
> 
> 
> We have a goal of supporting migration with mdev devices, Intel has
> already shown this is possible.  Tying the XML representation of an
> mdev device to a parent device is directly contradictory to that goal.
> libvirt needs a token which is unique across vendor to be able to
> instantiate an mdev device.  <parent, type_id> is unacceptable.
>  

Ok, I'll have dev->driver->name prefix for type_id from mdev core
module. In vendor driver attribute functions, they should also use same
format to identify the type.

>>  Presumably if
>>> we had a PCI device hosting this, we would be rooted at that parent
>>> device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0.  Maybe
>>> the type_id should automatically be prefixed by the vendor module name,
>>> ex. mtty-1, i915-foo, nvidia-bar.  There's something missing for
>>> deterministically creating a "XYZ" device and knowdev->driver->nameing exactly what that
>>> means and finding a parent device that supports it.
>>>   
>>
>> We can prefix type_id with module name, i.e using , but
>> <parent, type_id> pair is unique so I don't see much benefit in doing
>> that.
>>
>>
>>> Let's get to mdev creating...
>>>
>>> # uuidgen > mdev_supported_types/mtty2/create
>>> # tree /sys/devices/mtty
>>> /sys/devices/mtty
>>> |-- e68189be-700e-41f7-93a3-b5351e79c470
>>> |   |-- driver -> ../../../bus/mdev/drivers/vfio_mdev
>>> |   |-- iommu_group -> ../../../kernel/iommu_groups/63
>>> |   |-- mtty2 -> ../mdev_supported_types/mtty2
>>> |   |-- power
>>> |   |   |-- async
>>> |   |   |-- autosuspend_delay_ms
>>> |   |   |-- control
>>> |   |   |-- runtime_active_kids
>>> |   |   |-- runtime_active_time
>>> |   |   |-- runtime_enabled
>>> |   |   |-- runtime_status
>>> |   |   |-- runtime_suspended_time
>>> |   |   `-- runtime_usage
>>> |   |-- remove
>>> |   |-- subsystem -> ../../../bus/mdev
>>> |   |-- uevent
>>> |   `-- vendor
>>> |       `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470")
>>> |-- mdev_supported_types
>>> |   |-- mtty1
>>> |   |   |-- available_instances (22)
>>> |   |   |-- create
>>> |   |   |-- devices
>>> |   |   `-- name
>>> |   `-- mtty2
>>> |       |-- available_instances (11)
>>> |       |-- create
>>> |       |-- devices
>>> |       |   `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470
>>> |       `-- name
>>>
>>> The mdev device was created directly under the parent, which seems like
>>> it's going to get messy to me (ie. imagine dropping a bunch of uuids
>>> into a PCI parent device's sysfs directory, how does a user know what
>>> they are?).
>>>   
>>
>> That is the way devices are placed in sysfs. For example below devices:
>>
>> 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
>> Root Port 1a (rev 07)
>> 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
>> Root Port 2a (rev 07)
>> 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express
>> Root Port 3a in PCI Express Mode (rev 07)
>> 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 0 (rev 07)
>> 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 1 (rev 07)
>> 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 2 (rev 07)
>> 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 3 (rev 07)
>> 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 4 (rev 07)
>> 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 5 (rev 07)
>> 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 6 (rev 07)
>> 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel
>> 7 (rev 07)
>> 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address
>> Map, VTd_Misc, System Management (rev 07)
>> 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control
>> Status and Global Errors (rev 07)
>> 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07)
>>
>> In sysfs, those are in same parent folder of its parent root port:
>>
>> # ls /sys/devices/pci0000\:80/ -l
>> total 0
>> drwxr-xr-x 8 root root    0 Oct 13 12:08 0000:80:01.0
>> drwxr-xr-x 7 root root    0 Oct 13 12:08 0000:80:02.0
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:03.0
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.0
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.1
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.2
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.3
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.4
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.5
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.6
>> drwxr-xr-x 6 root root    0 Oct 13 12:08 0000:80:04.7
>> drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.0
>> drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.2
>> drwxr-xr-x 3 root root    0 Oct 13 12:08 0000:80:05.4
>> lrwxrwxrwx 1 root root    0 Oct 13 13:25 firmware_node ->
>> ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01
>> drwxr-xr-x 3 root root    0 Oct 13 12:08 pci_bus
>> drwxr-xr-x 2 root root    0 Oct 13 12:08 power
>> -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent
> 
> I agree, that's also messy, but a PCI device has a standard PCI address
> format, so just by looking at those you know it's a child PCI device.
> We can write code that understands how to parse that and understands
> what it is by the address.  On the other hand we're dropping uuids into
> a directory.  We can write code that understands how to parse it, but
> how do we know that it's a child device vs some other attribute for the
> parent?
> 


mdev child will have 'mdev_supported_type' link in its directory. That
should be sufficient to identify it's child and not other attribute.


>>> Under the device we have "mtty2", shouldn't that be
>>> "mdev_supported_type", which then links to mtty2?  Otherwise a user
>>> needs to decode from the link what this attribute is.
>>>   
>>
>> I thought it should show type, so that by looking at 'ls' output user
>> should be able to find type_id.
> 
> The type_id should be shown by actually reading the link, not by the
> link name itself, the same way that the iommu_group link for a device
> isn't the group number, it links to the group number but uses a
> standard link name.
> 

Ok. I'll rename the link name to 'mdev_supported_type'

>>> Also here's an example of those vendor sysfs entries per device.  So
>>> long as the vendor never expects a tool like libvirt to manipulate
>>> attributes there, I can see how that could be pretty powerful.
>>>   
>>
>> Yes, it is good to have vendor specific entries, libvirt might not
>> report/use it. That would be more useful for system admin to get extra
>> information manually that libvirt doesn't report.
>>
>>
>>> Moving down to the mdev_supported_types, I've updated mtty so that it
>>> actually adjusts available instance, and we can now see a link under
>>> the devices for mtty2.
>>>
>>> Also worth noting is that a link for the device appears
>>> in /sys/bus/mdev/devices.
>>>
>>> BTW, specifying this device for QEMU vfio-pci is where the sysfsdev
>>> option comes into play:
>>>
>>> -device
>>> vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470
>>>
>>> Which raises another question, we can tell through the vfio interfaces
>>> that this is exposes as a PCI device, by creating a container
>>> (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)),
>>> adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)),
>>> getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally
>>> getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the
>>> flag bit that says the API is PCI.  That's a long path to go and has
>>> stumbling blocks like the type of iommu that's available for the given
>>> platform.  How do we make that manageable?   
>>
>> Do you want device type to be expressed in sysfs? Then that should be
>> done from vendor driver. vfio_mdev module is now a shim layer, so mdev
>> core or vfio_mdev module don't know what device type flag vendor driver
>> had set.
> 
> Right, the vendor driver would need to expose this, the mdev layers are
> device agnostic, they don't know or care which device API is being
> exposed.  The other question is whether it needs to be part of the
> initial implementation or can we assume pci for now and add something
> later.  I guess we already have our proof to the contrary with the
> IBM ccw device that libvirt can't simply assume pci.  I see that many
> devices in sysfs have a subsystem link, which seems rather appropriate,
> but we're not creating a real pci device, so linking to /sys/bus/pci
> or /sys/class/pci_bus both seem invalid.  Is that a dead end?  We could
> always expose vfio_device_info.flags, but that seems pretty ugly as
> well, plus the sysfs mdev interface is not vfio specific.  What if we
> had a "device_api" attribute which the vendor driver would show as
> "vfio-pci"?  Therefore the mdev interface is not tied to vfio, but we
> clearly show that a given type_id clearly exports a vfio-pci
> interface.  Thanks,
> 

We can't use subsystem for mdev device. Kernel's device core framework
adds subsystem link to mdev device folder as:
 subsystem -> ../../../../../../../bus/mdev

We will have an mandatory attribute in 'supported_type_groups' which
should show "vfio-pci" or "vfio-platform" based on the device flag
vendor driver is going to set for that type.

Thanks,
Kirti

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux