Re: RFC: vfio / device assignment -- layout of device fd files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote:
> On 08/29/2011 02:51 PM, Alex Williamson wrote:
> > On Mon, 2011-08-29 at 16:51 +0000, Yoder Stuart-B08248 wrote:
> >> The device info records following the file header have the following
> >> record types each with content encoded in a record specific way:
> >>
> >>  REGION  - describes an addressable address range for the device
> >>  DTPATH - describes the device tree path for the device
> >>  DTINDEX - describes the index into the related device tree
> >>            property (reg,ranges,interrupts,interrupt-map)
> > 
> > I don't quite understand if these are physical or virtual.
> 
> If what are physical or virtual?

Can you give an example of a path vs an index?  I don't understand
enough about these to ask a useful question about what they're
describing.

> >>  INTERRUPT - describes an interrupt for the device
> >>  PCI_CONFIG_SPACE - describes config space for the device
> > 
> > I would have expected this to be a REGION with a property of
> > PCI_CONFIG_SPACE.
> 
> Could be, if physical address is made optional.

Or physical address is also a property, aka sub-region.

> >>  PCI_INFO - domain:bus:device:func
> > 
> > Not entirely sure we need this.  How are you imagining we get from a
> > group fd to a device fd (wondering if you're only including this for
> > enumeration)?  I'm currently coding it as a VFIO_GROUP_GET_DEVICE_FD
> > ioctl that takes a char* parameter that contains the dev_name() for the
> > device requested.  The list of devices under each group can be found by
> > read()ing the group fd.  
> 
> Seems like it would be nice to keep this information around, rather than
> require the user to pass it around separately.  Shouldn't cost much.

Seems redundant.  The user had to know the d:b:d.f to get the device fd
in the first place.  At best it's a sanity check.

> > If we keep this, we should make the interfaces similar, 
> 
> Yes.
> 
> >>  PCI_BAR_INFO - information about the BAR for a device
> >>
> >> For a device tree type device the file may look like:
> >>
> >>  0x0+---------------------------+
> >>     |        header             |      
> >>     +---------------------------+
> >>     |   type = REGION           |      
> >>     |   rec_len                 |      
> >>     |   flags =                 | u32 // region specific flags
> >>     |       is_mmapable         | 
> >>     |   offset                  | u64 // seek offset to region from
> >>     |                           |        from beginning
> >>     |   len                     | u64 // length of region
> >>     |   addr                    | u64 // phys addr of region
> > 
> > Would we only need to expose phys addr for 1:1 mapping requirements?
> > I'm not sure why we'd care to expose this otherwise.
> 
> It's more important for non-PCI, where it avoids the need for userspace
> to parse the device tree to find the guest address (we'll usually want
> 1:1), or to consolidate pages shared by multiple regions.  It could be
> nice for debugging, as well.

So the device tree path is ripped straight from the system, so it's the
actual 1:1, matching physical hardware, path.

> Seems like something that's better to have and not need, than the other
> way around.  It would need to be optional, though, if we want to be able
> to describe things without a physical address like config space.

sub-region?

> >>     |                           |      
> >>     +---------------------------+
> >>      \   type = DTPATH          \  // a sub-region
> >>       |   rec_len                |      
> >>       |   flags                  |      
> >>       |   dev tree path          | char[] // device tree path
> >>     +---------------------------+
> >>      \   type = DTINDEX         \  // a sub-region
> >>       |   rec_len                |      
> >>       |   flags                  |      
> >>       |   prop_type              | u32  // REG, RANGES
> >>       |   prop_index             | u32  // index  into resource list
> >>     +---------------------------+
> >>     |  type = INTERRUPT         |      
> >>     |  rec_len                  |      
> >>     |  flags                    | u32 
> >>     |  ioctl_handle             | u32 // argument to ioctl to get interrupts
> > 
> > Is this a dynamic ioctl number or just a u32 parameter to an ioctl like
> > VFIO_DEVICE_GET_IRQ_FD (ie. an instance number)?
> 
> It's a parameter to VFIO_DEVICE_GET_IRQ_FD.
> 
> >>     +---------------------------+
> >>     |   type = PCI_INFO         |      
> >>     |   rec_len                 |      
> >>     |   flags = 0x0             |      
> >>     |   dom:bus:dev:func        | u32 // pci device info
> >>     +---------------------------+
> >>     |   type = REGION           |      
> >>     |   rec_len                 |      
> >>     |   flags =                 |      
> >>     |       is_mmapable         |      
> >>     |   offset                  | u64 // seek offset to region from
> >>     |                           |        from beginning
> >>     |   len                     | u64 // length of region
> >>     |   addr                    | u64 // physical addr of region
> >>     +---------------------------+
> >>      \   type = PCI_BAR_INFO    \      
> >>       |   rec_len                |      
> >>       |   flags                  |      
> >>       |   bar_type               |  // pio
> >>       |                          |  // prefetable mmio
> >>       |                          |  // non-prefetchable mmmio
> >>       |   bar_index              |  // index of bar in device
> > 
> > Aren't a lot of these typical region attributes?  Wondering if we should
> > just make them part of the REGION flags or we'll have a growing number
> > of sub-regions describing common things.
> 
> It'd be nice to keep the base region record common among
> PCI/DT/whatever.
> 
> PCI_BAR_INFO itself can grow using flags if necessary.
> 
> > Even for non-PCI we need to
> > know if the region is pio/mmio32/mmio64/prefetchable/etc.
> 
> Outside of PCI, what standardized form would you put such information
> in?  Where would the kernel get this information?  What does
> mmio32/mmio64 mean in this context?

I could imagine a platform device described by ACPI that might want to
differentiate.  The physical device doesn't get moved of course, but
guest drivers might care how the device is described if we need to
rebuild those ACPI tables.  ACPI might even be a good place to leverage
these data structures... /me ducks.

> > BAR index could really just translate to a REGION instance number.
> 
> How would that work if you make non-BAR things (such as config space)
> into regions?

Put their instance numbers outside of the BAR region?  We have a fixed
REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 ==
instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7).  BTW, we
need a read-only flag for ROMs too.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux