Re: RFC [v2]: vfio / device assignment -- layout of device fd files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
> 
> ===============================================================
> 
> 1.  Overview
> 
>   This specification describes the layout of device files
>   used in the context of vfio, which gives user space
>   direct access to I/O devices that have been bound to
>   vfio.
> 
>   When a device fd is opened and read, offset 0x0 contains
>   a fixed sized header followed by a number of variable length
>   records that describe different characteristics
>   of the device-- addressable regions, interrupts, etc.
> 
>   0x0  +-------------+-------------+
>        |         magic             | u32  // identifies this as a vfio
> device file
>        +---------------------------+         and identifies the type of bus
>        |         version           | u32  // specifies the version of this
>        +---------------------------+
>        |         flags             | u32  // encodes any flags
>        +---------------------------+
>        |  dev info record 0        |
>        |    type                   | u32   // type of record
>        |    rec_len                | u32   // length in bytes of record
>        |                           |          (including record header)
>        |    flags                  | u32   // type specific flags
>        |    ...content...          |       // record content, which could
>        +---------------------------+       // include sub-records
>        |  dev info record 1        |
>        +---------------------------+
>        |  dev info record N        |
>        +---------------------------+
> 
>   The device info records following the file header may have
>   the following record types each with content encoded in
>   a record specific way:
> 
>   ------------+-------+------------------------------------------------------
>               |  type |
>    Region     |  num  | Description
>   ---------------------------------------------------------------------------
>   REGION           1    describes an addressable address range for the device
>   DTPATH           2    describes the device tree path for the device
>   DTINDEX          3    describes the index into the related device tree
>                           property (reg,ranges,interrupts,interrupt-map)
>   INTERRUPT        4    describes an interrupt for the device
>   PCI_CONFIG_SPACE 5    property identifying a region as PCI config space
>   PCI_BAR_INDEX    6    describes the BAR index for a PCI region
>   PHYS_ADDR        7    describes the physical address of the region
>   ---------------------------------------------------------------------------
> 
> 2. Header
> 
> The header is located at offset 0x0 in the device fd
> and has the following format:
> 
>     struct devfd_header {
>         __u32 magic;
>         __u32 version;
>         __u32 flags;
>     };
> 
>     The 'magic' field contains a magic value that will
>     identify the type bus the device is on.  Valid values
>     are:
> 
>         0x70636900   // "pci" - PCI device
>         0x64740000   // "dt" - device tree (system bus)
> 
> 3. Region
> 
>   A REGION record an addressable address region for the device.
> 
>     struct devfd_region {
>         __u32 type;   // must be 0x1
>         __u32 record_len;
>         __u32 flags;
>         __u64 offset; // seek offset to region from beginning
>                       // of file
>         __u64 len   ; // length of the region
>     };
> 
>   The 'flags' field supports one flag:
> 
>       IS_MMAPABLE
> 
> 4. Device Tree Path (DTPATH)
> 
>   A DTPATH record is a sub-record of a REGION and describes
>   the path to a device tree node for the region

Can we better distinguish sub-records from records?  I assume we're
trying to be as versatile as possible by having a single "type" address
space, but is this going to lead to implementation problems?  A DTPATH
as a record, an INTERRUPT as a sub-record, etc.  Should we instead have
a "subtype" address space per "type" and per device type?  For a "dt"
device, it looks like we really have:

      * REGION (type 0)
              * DTPATH (subtype 0)
              * DTINDEX (subtype 1)
              * PHYS_ADDR (subtype 2)
      * INTERRUPT (type 1)
              * DTPATH (subtype 0)
              * DTINDEX (subtype 1)

While "pci" is:

      * REGION (type 0)
              * PCI_CONFIG_SPACE (subtype 0)
              * PCI_BAR_INDEX (subtype 1)
      * INTERRUPT (type 1)

>     struct devfd_dtpath {
>         __u32 type;   // must be 0x2
>         __u32 record_len;
>         __u64 char[]   ; // length of the region
>     };
> 
> 5. Device Tree Index (DTINDEX)
> 
>   A DTINDEX record is a sub-record of a REGION and specifies
>   the index into the resource list encoded in the associated
>   device tree property-- "reg", "ranges", "interrupts", or
>   "interrupt-map".
> 
>     struct devfd_dtindex {
>         __u32 type;   // must be 0x3
>         __u32 record_len;
>         __u32 prop_type;
>         __u32 prop_index;  // index into the resource list
>     };
> 
>     prop_type must have one of the follow values:
>        1   // "reg" property
>        2   // "ranges" property
>        3   // "interrupts" property
>        4   // "interrupts" property
> 
>     Note: prop_index is not the byte offset into the property,
>     but the logical index.
> 
> 6. Interrupts (INTERRUPT)
> 
>   An INTERRUPT record describes one of a device's interrupts.
>   The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD
>   which user space can use to receive device interrupts.
> 
>     struct devfd_interrupts {
>         __u32 type;   // must be 0x4
>         __u32 record_len;
>         __u32 flags;
>         __u32 handle;  // parameter to VFIO_DEVICE_GET_IRQ_FD
>     };

I'm still on the fence whether we should implement INTERRUPT for PCI or
only assume handle 0x0 or maybe assume handle == interrupt pin.

> 
> 7.  PCI Config Space (PCI_CONFIG_SPACE)
> 
>     A PCI_CONFIG_SPACE record is a sub-record of a REGION record
>     and identifies the region as PCI configuration space.
> 
>     struct devfd_cfgspace {
>         __u32 type;   // must be 0x5
>         __u32 record_len;
>         __u32 flags;
>     }
> 
> 8.  PCI Bar Index (PCI_BAR_INDEX)
> 
>     A PCI_BAR_INDEX record is a sub-record of a REGION record
>     and identifies the PCI BAR index for the region.
> 
>     struct devfd_barindex {
>         __u32 type;   // must be 0x6
>         __u32 record_len;
>         __u32 flags;
>         __u32 bar_index;
>     }

I suppose we're more concerned with easy parsing and alignment than
compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok.

> 
> 9.  Physical Address (PHYS_ADDR)
> 
>     A PHYS_ADDR record is a sub-record of a REGION record
>     and specifies the physical address of the region.
> 
>     struct devfd_physaddr {
>         __u32 type;   // must be 0x7
>         __u32 record_len;
>         __u32 flags;
>         __u64 phys_addr;
>     }

Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux