On Mon, 2011-08-29 at 16:51 +0000, Yoder Stuart-B08248 wrote: > Alex Graf, Scott Wood, and I met last week to try to flesh out > some details as to how vfio could work for non-PCI devices, > like we have in embedded systems. This most likely will > require a different kernel driver than vfio-- for now we are > calling it "dtio" (for device tree I/O) as there is no way > to discover these devices except from the device tree. But > the dtio driver would use the same architecture and interfaces > as vfio. Why is this a different kernel driver? The difference will primarily be in what bus types vfio registers drivers and the set of device types the device fds support. The group and iommu interfaces will be shared. This sounds more like vfio .config options (CONFIG_VFIO_PCI, CONFIG_VFIO_DT). > For devices on a system bus and represented in a device > tree we have some different requirements than PCI for what > is exposed in the device fd file. A device may have multiple > address regions, multiple interrupts, a variable length device > tree path, whether a region is mmapable, etc. > > With existing vfio, the device fd file layout is something > like: > 0xF Config space offset > ... > 0x6 ROM offset > 0x5 BAR 5 offset > 0x4 BAR 4 offset > 0x3 BAR 3 offset > 0x2 BAR 2 offset > 0x1 BAR 1 offset > 0x0 BAR 0 offset > > We have an alternate proposal that we think is more flexible, > extensible, and will accommodate both PCI and system bus > type devices (and others). > > Instead of config space fixed at 0xf, we would propose > a header and multiple 'device info' records at offset 0x0 that would > encode everything that user space needs to know about > the device: > > 0x0 +-------------+-------------+ > | magic | version | u64 // magic u64 identifies the type of > | "vfio" | | // passthru I/O, plus version # > | "dtio" | | // "vfio" - PCI devices > +-------------+-------------+ // "dtio" - device tree devices Maybe magic = "pci", "dt", ... > | flags | u32 // encodes any flags (TBD) > +---------------------------+ > | dev info record N | > | type | u32 // type of record > | rec_len | u32 // length in bytes of record > | | (including record header) > | flags | u32 // type specific flags > | ...content... | // record content, which could > +---------------------------+ // include sub-records > | dev info record N+1 | > +---------------------------+ > | dev info record N+2 | > +---------------------------+ > ... > > The device info records following the file header have the following > record types each with content encoded in a record specific way: > > REGION - describes an addressable address range for the device > DTPATH - describes the device tree path for the device > DTINDEX - describes the index into the related device tree > property (reg,ranges,interrupts,interrupt-map) I don't quite understand if these are physical or virtual. > INTERRUPT - describes an interrupt for the device > PCI_CONFIG_SPACE - describes config space for the device I would have expected this to be a REGION with a property of PCI_CONFIG_SPACE. > PCI_INFO - domain:bus:device:func Not entirely sure we need this. How are you imagining we get from a group fd to a device fd (wondering if you're only including this for enumeration)? I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD ioctl that takes a char* parameter that contains the dev_name() for the device requested. The list of devices under each group can be found by read()ing the group fd. If we keep this, we should make the interfaces similar, in fact, maybe this is how we describe the capabilities of the iommu too, reading a table from the iommu fd. > PCI_BAR_INFO - information about the BAR for a device > > For a device tree type device the file may look like: > > 0x0+---------------------------+ > | header | > +---------------------------+ > | type = REGION | > | rec_len | > | flags = | u32 // region specific flags > | is_mmapable | > | offset | u64 // seek offset to region from > | | from beginning > | len | u64 // length of region > | addr | u64 // phys addr of region Would we only need to expose phys addr for 1:1 mapping requirements? I'm not sure why we'd care to expose this otherwise. > | | > +---------------------------+ > \ type = DTPATH \ // a sub-region > | rec_len | > | flags | > | dev tree path | char[] // device tree path > +---------------------------+ > \ type = DTINDEX \ // a sub-region > | rec_len | > | flags | > | prop_type | u32 // REG, RANGES > | prop_index | u32 // index into resource list > +---------------------------+ > | type = INTERRUPT | > | rec_len | > | flags | u32 > | ioctl_handle | u32 // argument to ioctl to get interrupts Is this a dynamic ioctl number or just a u32 parameter to an ioctl like VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)? > | | > +---------------------------+ > \ type = DTPATH \ // a sub-region > | rec_len | > | flags | > | dev tree path | char[] // device tree path > +---------------------------+ > \ type = DTINDEX \ // a sub-region > | rec_len | > | flags | > | prop_type | u32 // INTERRUPT,INTERRUPT_MAP > | prop_index | u32 // index > > > PCI devices would have a PCI specific encoding. Instead of > config space and the mappable BAR regions being at specific > predetermined offsets, the device info records would describe > this. Something like: > > 0x0 +---------------------------+ > | type = PCI_CONFIG_SPACE | > | rec_len | > | flags = 0x0 | > | offset | u64 // seek offset to config space > | | from beginnning > | config_space_len | u32 // length of config space Again, not sure why this isn't just a REGION. > +---------------------------+ > | type = PCI_INFO | > | rec_len | > | flags = 0x0 | > | dom:bus:dev:func | u32 // pci device info > +---------------------------+ > | type = REGION | > | rec_len | > | flags = | > | is_mmapable | > | offset | u64 // seek offset to region from > | | from beginning > | len | u64 // length of region > | addr | u64 // physical addr of region > +---------------------------+ > \ type = PCI_BAR_INFO \ > | rec_len | > | flags | > | bar_type | // pio > | | // prefetable mmio > | | // non-prefetchable mmmio > | bar_index | // index of bar in device Aren't a lot of these typical region attributes? Wondering if we should just make them part of the REGION flags or we'll have a growing number of sub-regions describing common things. Even for non-PCI we need to know if the region is pio/mmio32/mmio64/prefetchable/etc. BAR index could really just translate to a REGION instance number. > +--------------------------+ > > There may be other more complex device or bus types that > need their own special encodings, and this format would > allow the definition of new records to define devices. Two > other types that come to mind are Serial Rapid I/O busses > commonly used in our networking SoCs and the FSL DPAA > portals which are very strange devices that may require > their own unique interface exposed to user space. > > In short, when user space opens up a device fd it needs > some information about what this device is, and this > proposal tries to address that. Thanks for trying to come up with a specification. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html