On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote: > Based on the discussions over the last couple of weeks > I have updated the device fd file layout proposal and > tried to specify it a bit more formally. > > =============================================================== > > 1. Overview > > This specification describes the layout of device files > used in the context of vfio, which gives user space > direct access to I/O devices that have been bound to > vfio. > > When a device fd is opened and read, offset 0x0 contains > a fixed sized header followed by a number of variable length > records that describe different characteristics > of the device-- addressable regions, interrupts, etc. > > 0x0 +-------------+-------------+ > | magic | u32 // identifies this as a vfio > device file > +---------------------------+ and identifies the type of bus > | version | u32 // specifies the version of this > +---------------------------+ > | flags | u32 // encodes any flags > +---------------------------+ > | dev info record 0 | > | type | u32 // type of record > | rec_len | u32 // length in bytes of record > | | (including record header) > | flags | u32 // type specific flags > | ...content... | // record content, which could > +---------------------------+ // include sub-records > | dev info record 1 | > +---------------------------+ > | dev info record N | > +---------------------------+ > > The device info records following the file header may have > the following record types each with content encoded in > a record specific way: > > ------------+-------+------------------------------------------------------ > | type | > Region | num | Description > --------------------------------------------------------------------------- > REGION 1 describes an addressable address range for the device > DTPATH 2 describes the device tree path for the device > DTINDEX 3 describes the index into the related device tree > property (reg,ranges,interrupts,interrupt-map) > INTERRUPT 4 describes an interrupt for the device > PCI_CONFIG_SPACE 5 property identifying a region as PCI config space > PCI_BAR_INDEX 6 describes the BAR index for a PCI region > PHYS_ADDR 7 describes the physical address of the region > --------------------------------------------------------------------------- > > 2. Header > > The header is located at offset 0x0 in the device fd > and has the following format: > > struct devfd_header { > __u32 magic; > __u32 version; > __u32 flags; > }; > > The 'magic' field contains a magic value that will > identify the type bus the device is on. Valid values > are: > > 0x70636900 // "pci" - PCI device > 0x64740000 // "dt" - device tree (system bus) > > 3. Region > > A REGION record an addressable address region for the device. > > struct devfd_region { > __u32 type; // must be 0x1 > __u32 record_len; > __u32 flags; > __u64 offset; // seek offset to region from beginning > // of file > __u64 len ; // length of the region > }; > > The 'flags' field supports one flag: > > IS_MMAPABLE > > 4. Device Tree Path (DTPATH) > > A DTPATH record is a sub-record of a REGION and describes > the path to a device tree node for the region Can we better distinguish sub-records from records? I assume we're trying to be as versatile as possible by having a single "type" address space, but is this going to lead to implementation problems? A DTPATH as a record, an INTERRUPT as a sub-record, etc. Should we instead have a "subtype" address space per "type" and per device type? For a "dt" device, it looks like we really have: * REGION (type 0) * DTPATH (subtype 0) * DTINDEX (subtype 1) * PHYS_ADDR (subtype 2) * INTERRUPT (type 1) * DTPATH (subtype 0) * DTINDEX (subtype 1) While "pci" is: * REGION (type 0) * PCI_CONFIG_SPACE (subtype 0) * PCI_BAR_INDEX (subtype 1) * INTERRUPT (type 1) > struct devfd_dtpath { > __u32 type; // must be 0x2 > __u32 record_len; > __u64 char[] ; // length of the region > }; > > 5. Device Tree Index (DTINDEX) > > A DTINDEX record is a sub-record of a REGION and specifies > the index into the resource list encoded in the associated > device tree property-- "reg", "ranges", "interrupts", or > "interrupt-map". > > struct devfd_dtindex { > __u32 type; // must be 0x3 > __u32 record_len; > __u32 prop_type; > __u32 prop_index; // index into the resource list > }; > > prop_type must have one of the follow values: > 1 // "reg" property > 2 // "ranges" property > 3 // "interrupts" property > 4 // "interrupts" property > > Note: prop_index is not the byte offset into the property, > but the logical index. > > 6. Interrupts (INTERRUPT) > > An INTERRUPT record describes one of a device's interrupts. > The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD > which user space can use to receive device interrupts. > > struct devfd_interrupts { > __u32 type; // must be 0x4 > __u32 record_len; > __u32 flags; > __u32 handle; // parameter to VFIO_DEVICE_GET_IRQ_FD > }; I'm still on the fence whether we should implement INTERRUPT for PCI or only assume handle 0x0 or maybe assume handle == interrupt pin. > > 7. PCI Config Space (PCI_CONFIG_SPACE) > > A PCI_CONFIG_SPACE record is a sub-record of a REGION record > and identifies the region as PCI configuration space. > > struct devfd_cfgspace { > __u32 type; // must be 0x5 > __u32 record_len; > __u32 flags; > } > > 8. PCI Bar Index (PCI_BAR_INDEX) > > A PCI_BAR_INDEX record is a sub-record of a REGION record > and identifies the PCI BAR index for the region. > > struct devfd_barindex { > __u32 type; // must be 0x6 > __u32 record_len; > __u32 flags; > __u32 bar_index; > } I suppose we're more concerned with easy parsing and alignment than compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok. > > 9. Physical Address (PHYS_ADDR) > > A PHYS_ADDR record is a sub-record of a REGION record > and specifies the physical address of the region. > > struct devfd_physaddr { > __u32 type; // must be 0x7 > __u32 record_len; > __u32 flags; > __u64 phys_addr; > } Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html