Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 26.09.2011 um 09:51 schrieb David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>:

> On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
>> Based on the discussions over the last couple of weeks
>> I have updated the device fd file layout proposal and
>> tried to specify it a bit more formally.
>> 
>> ===============================================================
>> 
>> 1.  Overview
>> 
>>  This specification describes the layout of device files
>>  used in the context of vfio, which gives user space
>>  direct access to I/O devices that have been bound to
>>  vfio.
>> 
>>  When a device fd is opened and read, offset 0x0 contains
>>  a fixed sized header followed by a number of variable length
>>  records that describe different characteristics
>>  of the device-- addressable regions, interrupts, etc.
>> 
>>  0x0  +-------------+-------------+
>>       |         magic             | u32  // identifies this as a vfio
>> device file
>>       +---------------------------+         and identifies the type of bus
>>       |         version           | u32  // specifies the version of this
>>       +---------------------------+
>>       |         flags             | u32  // encodes any flags
>>       +---------------------------+
>>       |  dev info record 0        |
>>       |    type                   | u32   // type of record
>>       |    rec_len                | u32   // length in bytes of record
>>       |                           |          (including record header)
>>       |    flags                  | u32   // type specific flags
>>       |    ...content...          |       // record content, which could
>>       +---------------------------+       // include sub-records
>>       |  dev info record 1        |
>>       +---------------------------+
>>       |  dev info record N        |
>>       +---------------------------+
> 
> I really should have chimed in on this earlier, but I've been very
> busy.
> 
> Um, not to put too fine a point on it, this is madness.
> 
> Yes, it's very flexible and can thereby cover a very wide range of
> cases.  But it's much, much too complex.  Userspace has to parse a
> complex, multilayered data structure, with variable length elements
> just to get an address at which to do IO.  I can pretty much guarantee
> that if we went with this, most userspace programs using this
> interface would just ignore this metadata and directly map the
> offsets at which they happen to know the kernel will put things for
> the type of device they care about.
> 
> _At least_ for PCI, I think the original VFIO layout of each BAR at a
> fixed, well known offset is much better.  Despite its limitations,
> just advertising a "device type" ID which describes one of a few fixed
> layouts would be preferable to this.  I'm still hoping, that we can do
> a bit better than that.  But we should try really hard to at the very
> least force the metadata into a simple array of resources each with a
> fixed size record describing it, even if it means some space wastage
> with occasionally-used fields.  Anything more complex than that and
> userspace is just never going to use it properly.

We will have 2 different types of user space. One wants to be as generic as possible and needs all this dynamic information. QEMU would fall into this category.

The other one is device specific drivers in user space. Here hardcoding might make sense.

For the generic interface, we need something that us as verbose as possible and lets us enumerate all the device properties, so we can properly map and forward them to the guest.

However, nothing keeps us from mapping BARs always at static offsets into the file. If you don't need the generic info, then you don't need it.

Also, if you can come up with an interface that does not have variable length descriptors but is still able to export all the required generic information, please send a proposal to the list :)


Alex
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux