RFC: vfio / device assignment -- layout of device fd files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex Graf, Scott Wood, and I met last week to try to flesh out
some details as to how vfio could work for non-PCI devices,
like we have in embedded systems.   This most likely will
require a different kernel driver than vfio-- for now we are
calling it "dtio" (for device tree I/O) as there is no way
to discover these devices except from the device tree.   But
the dtio driver would use the same architecture and interfaces
as vfio.

For devices on a system bus and represented in a device
tree we have some different requirements than PCI for what
is exposed in the device fd file.  A device may have multiple
address regions, multiple interrupts, a variable length device
tree path, whether a region is mmapable, etc.

With existing vfio, the device fd file layout is something
like:
  0xF Config space offset
  ...
  0x6 ROM offset
  0x5 BAR 5 offset
  0x4 BAR 4 offset
  0x3 BAR 3 offset
  0x2 BAR 2 offset
  0x1 BAR 1 offset
  0x0 BAR 0 offset

We have an alternate proposal that we think is more flexible,
extensible, and will accommodate both PCI and system bus
type devices (and others).

Instead of config space fixed at 0xf, we would propose
a header and multiple 'device info' records at offset 0x0 that would
encode everything that user space needs to know about
the device:

  0x0  +-------------+-------------+
       | magic       |   version   | u64   // magic u64 identifies the type of
       |   "vfio"    |             |       // passthru I/O, plus version #
       |   "dtio"    |             |       //   "vfio" - PCI devices
       +-------------+-------------+       //   "dtio" - device tree devices
       |         flags             | u32   // encodes any flags (TBD)
       +---------------------------+
       |  dev info record N        |
       |    type                   | u32   // type of record
       |    rec_len                | u32   // length in bytes of record
       |                           |          (including record header)
       |    flags                  | u32   // type specific flags
       |    ...content...          |       // record content, which could
       +---------------------------+       // include sub-records
       |  dev info record N+1      |
       +---------------------------+
       |  dev info record N+2      |
       +---------------------------+
       ...

The device info records following the file header have the following
record types each with content encoded in a record specific way:

 REGION  - describes an addressable address range for the device
 DTPATH - describes the device tree path for the device
 DTINDEX - describes the index into the related device tree
           property (reg,ranges,interrupts,interrupt-map)
 INTERRUPT - describes an interrupt for the device
 PCI_CONFIG_SPACE - describes config space for the device
 PCI_INFO - domain:bus:device:func
 PCI_BAR_INFO - information about the BAR for a device

For a device tree type device the file may look like:

 0x0+---------------------------+
    |        header             |      
    +---------------------------+
    |   type = REGION           |      
    |   rec_len                 |      
    |   flags =                 | u32 // region specific flags
    |       is_mmapable         | 
    |   offset                  | u64 // seek offset to region from
    |                           |        from beginning
    |   len                     | u64 // length of region
    |   addr                    | u64 // phys addr of region
    |                           |      
    +---------------------------+
     \   type = DTPATH          \  // a sub-region
      |   rec_len                |      
      |   flags                  |      
      |   dev tree path          | char[] // device tree path
    +---------------------------+
     \   type = DTINDEX         \  // a sub-region
      |   rec_len                |      
      |   flags                  |      
      |   prop_type              | u32  // REG, RANGES
      |   prop_index             | u32  // index  into resource list
    +---------------------------+
    |  type = INTERRUPT         |      
    |  rec_len                  |      
    |  flags                    | u32 
    |  ioctl_handle             | u32 // argument to ioctl to get interrupts
    |                           |      
    +---------------------------+
     \   type = DTPATH         \  // a sub-region    
      |   rec_len               |      
      |   flags                 |      
      |   dev tree path         |  char[] // device tree path
    +---------------------------+
      \   type = DTINDEX       \  // a sub-region 
      |   rec_len               |      
      |   flags                 |      
      |   prop_type             | u32 // INTERRUPT,INTERRUPT_MAP
      |   prop_index            | u32 // index


PCI devices would have a PCI specific encoding.  Instead of
config space and the mappable BAR regions being at specific
predetermined offsets, the device info records would describe
this.  Something like:

0x0 +---------------------------+
    |   type = PCI_CONFIG_SPACE |      
    |   rec_len                 |      
    |   flags = 0x0             |      
    |   offset                  | u64 // seek offset to config space
    |                           |        from beginnning
    |   config_space_len        | u32 // length of config space
    +---------------------------+
    |   type = PCI_INFO         |      
    |   rec_len                 |      
    |   flags = 0x0             |      
    |   dom:bus:dev:func        | u32 // pci device info
    +---------------------------+
    |   type = REGION           |      
    |   rec_len                 |      
    |   flags =                 |      
    |       is_mmapable         |      
    |   offset                  | u64 // seek offset to region from
    |                           |        from beginning
    |   len                     | u64 // length of region
    |   addr                    | u64 // physical addr of region
    +---------------------------+
     \   type = PCI_BAR_INFO    \      
      |   rec_len                |      
      |   flags                  |      
      |   bar_type               |  // pio
      |                          |  // prefetable mmio
      |                          |  // non-prefetchable mmmio
      |   bar_index              |  // index of bar in device
      +--------------------------+

There may be other more complex device or bus types that
need their own special encodings, and this format would
allow the definition of new records to define devices.  Two
other types that come to mind are Serial Rapid I/O busses
commonly used in our networking SoCs and the FSL DPAA
portals which are very strange devices that may require
their own unique interface exposed to user space.

In short, when user space opens up a device fd it needs
some information about what this device is, and this
proposal tries to address that.

Regards,
Stuart Yoder



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux