On 07/02/2013 06:25:59 PM, Yoder Stuart-B08248 wrote:
The write-up below is the first draft of a proposal for how the
kernel can expose
platform devices to user space using vfio.
In short, I'm proposing a new ioctl VFIO_DEVICE_GET_DEVTREE_INFO which
allows user space to correlate regions and interrupts to the
corresponding
device tree node structure that is defined for most platform devices.
Regards,
Stuart Yoder
------------------------------------------------------------------------------
VFIO for Platform Devices
The existing infrastructure for vfio-pci is pretty close to what we
need:
-mechanism to create a container
-add groups/devices to a container
-set the IOMMU model
-map DMA regions
-get an fd for a specific device, which allows user space to
determine
info about device regions (e.g. registers) and interrupt info
-support for mmapping device regions
-mechanism to set how interrupts are signaled
Platform devices can get complicated-- potentially with a tree
hierarchy
of nodes, and links/phandles pointing to other platform
devices. The kernel doesn't expose relationships between
devices. The kernel just exposes mappable register regions and
interrupts.
It's up to user space to work out relationships between devices
if it needs to-- this can be determined in the device tree exposed in
/proc/device-tree.
I think the changes needed for vfio are around some of the device tree
related info that needs to be available with the device fd.
1. VFIO_GROUP_GET_DEVICE_FD
User space has to know which device it is accessing and will call
VFIO_GROUP_GET_DEVICE_FD passing a specific platform device path to
get the device information:
fd = ioctl(group, VFIO_GROUP_GET_DEVICE_FD,
"/soc@ffe000000/usb@210000");
(whether the path is a device tree path or a sysfs path is up for
discussion, e.g. "/sys/bus/platform/devices/ffe210000.usb")
Doesn't VFIO need to operate on an actual Linux device, rather than
just an OF node?
Are we going to have a fixed assumption that you always want all the
children of the node corresponding to the assigned device, or will it
be possible to exclude some?
2. VFIO_DEVICE_GET_INFO
Don't think any changes are needed to VFIO_DEVICE_GET_INFO other
than adding a new flag identifying a devices as a 'platform'
device.
This ioctl simply returns the number of regions and number of irqs.
The number of regions corresponds to the number of regions
that can be mapped for the device-- corresponds to the regions
defined
in "reg" and "ranges" in the device tree.
3. VFIO_DEVICE_GET_REGION_INFO
No changes needed, except perhaps adding a new flag. Freescale
has some
devices with regions that must be mapped cacheable.
While I don't object to making the information available to the user
just in case, the main thing we need here is to influence what the
kernel does when the user tries to map it. At least on PPC it's not up
to userspace to select whether a mmap is cacheable.
4. VFIO_DEVICE_GET_DEVTREE_INFO
The VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO APIs
expose device regions and interrupts, but it's not enough to know
that there are X regions and Y interrupts. User space needs to
know what the resources are for-- to correlate those
regions/interrupts
to the device tree structure that drivers use. The device tree
structure could consist of multiple nodes and it is necessary to
identify the node corresponding to the region/interrupt exposed
by VFIO.
The following information is needed:
-the device tree path to the node corresponding to the
region or interrupt
-for a region, whether it corresponds to a "reg" or "ranges"
property
-there could be multiple sub-regions per "reg" or "ranges" and
the sub-index within the reg/ranges is needed
The VFIO_DEVICE_GET_DEVTREE_INFO operates on a device fd.
ioctl: VFIO_DEVICE_GET_DEVTREE_INFO
struct vfio_path_info {
__u32 argsz;
__u32 flags;
#define VFIO_DEVTREE_INFO_RANGES (1 << 3) /* the region is a
"ranges" property */
What about distinguishing a normal interrupt from one found in an
interrupt-map?
In the case of both ranges and interrupt-maps, we'll also want to
decide what the policy is for when to expose them directly, versus just
using them to translate regs and interrupts of child nodes.
__u32 index; /* input: index of region or irq for
which we are getting info */
__u32 type; /* input: 0 - get devtree info for a
region
1 - get devtree info for an
irq
*/
__u32 start; /* output: identifies the index
within the reg/ranges */
"start" is an odd name for this. I'd rename "index" to "vfio_index"
and this to "dt_index".
__u8 path[]; /* output: Full path to associated
device tree node */
How does the caller know what size buffer to supply for this?
The VFIO_DEVICE_GET_DEVTREE_INFO ioctl would return:
-for region index 0:
flags: 0x0 // i.e. this is a "reg" property
start: 0x0 // i.e. index 0x0 in "reg"
path: "/soc@ffe000000/crypto@300000"
-for interrupt index 0:
path: "/soc@ffe000000/crypto@300000/jr@1000"
-for interrupt index 1:
path: "/soc@ffe000000/crypto@300000/jr@2000"
Where is "start" for the interrupts?
-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html