On 05.11.14 13:03, Eric Auger wrote: > On 11/05/2014 11:29 AM, Alexander Graf wrote: >> >> >> On 31.10.14 15:05, Eric Auger wrote: >>> Minimal VFIO platform implementation supporting >>> - register space user mapping, >>> - IRQ assignment based on eventfds handled on qemu side. >>> >>> irqfd kernel acceleration comes in a subsequent patch. >>> >>> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxxxxx> >>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>> >>> --- >>> v6 -> v7: >>> - compat is not exposed anymore as a user option. Rationale is >>> the vfio device became abstract and a specialization is needed >>> anyway. The derived device must set the compat string. >>> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h. >>> A new function dubbed vfio_register_irq_starter replaces it. It >>> registers a machine init done notifier that programs & starts >>> all dynamic VFIO device IRQs. This function is supposed to be >>> called by the machine file. A set of static helper routines are >>> added too. It must be called before the creation of the platform >>> bus device. >>> >>> v5 -> v6: >>> - vfio_device property renamed into host property >>> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl >>> and remove PCI related comment >>> - remove declaration of vfio_setup_irqfd and irqfd_allowed >>> property.Both belong to next patch (irqfd) >>> - remove declaration of vfio_intp_interrupt in vfio-platform.h >>> - functions that can be static get this characteristic >>> - remove declarations of vfio_region_ops, vfio_memory_listener, >>> group_list, vfio_address_spaces. All are moved to vfio-common.h >>> - remove vfio_put_device declaration and definition >>> - print_regions removed. code moved into vfio_populate_regions >>> - replace DPRINTF by trace events >>> - new helper routine to set the trigger eventfd >>> - dissociate intp init from the injection enablement: >>> vfio_enable_intp renamed into vfio_init_intp and new function >>> named vfio_start_eventfd_injection >>> - injection start moved to vfio_start_irq_injection (not anymore >>> in vfio_populate_interrupt) >>> - new start_irq_fn field in VFIOPlatformDevice corresponding to >>> the function that will be used for starting injection >>> - user handled eventfd: >>> x add mutex to protect IRQ state & list manipulation, >>> x correct misleading comment in vfio_intp_interrupt. >>> x Fix bugs thanks to fake interrupt modality >>> - VFIOPlatformDeviceClass becomes abstract >>> - add error_setg in vfio_platform_realize >>> >>> v4 -> v5: >>> - vfio-plaform.h included first >>> - cleanup error handling in *populate*, vfio_get_device, >>> vfio_enable_intp >>> - vfio_put_device not called anymore >>> - add some includes to follow vfio policy >>> >>> v3 -> v4: >>> [Eric Auger] >>> - merge of "vfio: Add initial IRQ support in platform device" >>> to get a full functional patch although perfs are limited. >>> - removal of unrealize function since I currently understand >>> it is only used with device hot-plug feature. >>> >>> v2 -> v3: >>> [Eric Auger] >>> - further factorization between PCI and platform (VFIORegion, >>> VFIODevice). same level of functionality. >>> >>> <= v2: >>> [Kim Philipps] >>> - Initial Creation of the device supporting register space mapping >>> --- >>> hw/vfio/Makefile.objs | 1 + >>> hw/vfio/platform.c | 672 ++++++++++++++++++++++++++++++++++++++++ >>> include/hw/vfio/vfio-common.h | 1 + >>> include/hw/vfio/vfio-platform.h | 87 ++++++ >>> trace-events | 12 + >>> 5 files changed, 773 insertions(+) >>> create mode 100644 hw/vfio/platform.c >>> create mode 100644 include/hw/vfio/vfio-platform.h >>> >>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs >>> index e31f30e..c5c76fe 100644 >>> --- a/hw/vfio/Makefile.objs >>> +++ b/hw/vfio/Makefile.objs >>> @@ -1,4 +1,5 @@ >>> ifeq ($(CONFIG_LINUX), y) >>> obj-$(CONFIG_SOFTMMU) += common.o >>> obj-$(CONFIG_PCI) += pci.o >>> +obj-$(CONFIG_SOFTMMU) += platform.o >>> endif >>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c >>> new file mode 100644 >>> index 0000000..9f66610 >>> --- /dev/null >>> +++ b/hw/vfio/platform.c >>> @@ -0,0 +1,672 @@ >>> +/* >>> + * vfio based device assignment support - platform devices >>> + * >>> + * Copyright Linaro Limited, 2014 >>> + * >>> + * Authors: >>> + * Kim Phillips <kim.phillips@xxxxxxxxxx> >>> + * >>> + * This work is licensed under the terms of the GNU GPL, version 2. See >>> + * the COPYING file in the top-level directory. >>> + * >>> + * Based on vfio based PCI device assignment support: >>> + * Copyright Red Hat, Inc. 2012 >>> + */ >>> + >>> +#include <linux/vfio.h> >>> +#include <sys/ioctl.h> >>> + >>> +#include "hw/vfio/vfio-platform.h" >>> +#include "qemu/error-report.h" >>> +#include "qemu/range.h" >>> +#include "sysemu/sysemu.h" >>> +#include "exec/memory.h" >>> +#include "qemu/queue.h" >>> +#include "hw/sysbus.h" >>> +#include "trace.h" >>> +#include "hw/platform-bus.h" >>> + >>> +static void vfio_intp_interrupt(VFIOINTp *intp); >>> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp); >>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp, >>> + eventfd_user_side_handler_t handler); >>> + >>> +/* >>> + * Functions only used when eventfd are handled on user-side >>> + * ie. without irqfd >>> + */ >>> + >>> +/** >>> + * vfio_platform_eoi - IRQ completion routine >>> + * @vbasedev: the VFIO device >>> + * >>> + * de-asserts the active virtual IRQ and unmask the physical IRQ >>> + * (masked by the VFIO driver). Handle pending IRQs if any. >>> + * eoi function is called on the first access to any MMIO region >>> + * after an IRQ was triggered. It is assumed this access corresponds >>> + * to the IRQ status register reset. With such a mechanism, a single >>> + * IRQ can be handled at a time since there is no way to know which >>> + * IRQ was completed by the guest (we would need additional details >>> + * about the IRQ status register mask) >>> + */ >>> +static void vfio_platform_eoi(VFIODevice *vbasedev) >>> +{ >>> + VFIOINTp *intp; >>> + VFIOPlatformDevice *vdev = >>> + container_of(vbasedev, VFIOPlatformDevice, vbasedev); >>> + >>> + qemu_mutex_lock(&vdev->intp_mutex); >>> + QLIST_FOREACH(intp, &vdev->intp_list, next) { >>> + if (intp->state == VFIO_IRQ_ACTIVE) { >>> + trace_vfio_platform_eoi(intp->pin, >>> + event_notifier_get_fd(&intp->interrupt)); >>> + intp->state = VFIO_IRQ_INACTIVE; >>> + >>> + /* deassert the virtual IRQ and unmask physical one */ >>> + qemu_set_irq(intp->qemuirq, 0); >>> + vfio_unmask_irqindex(vbasedev, intp->pin); >>> + >>> + /* a single IRQ can be active at a time */ >>> + break; >>> + } >>> + } >>> + /* in case there are pending IRQs, handle them one at a time */ >>> + if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) { >>> + intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue); >>> + trace_vfio_platform_eoi_handle_pending(intp->pin); >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> + vfio_intp_interrupt(intp); >>> + qemu_mutex_lock(&vdev->intp_mutex); >>> + QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext); >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> + } else { >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> + } >>> +} >>> + >>> +/** >>> + * vfio_mmap_set_enabled - enable/disable the fast path mode >>> + * @vdev: the VFIO platform device >>> + * @enabled: the target mmap state >>> + * >>> + * true ~ fast path = MMIO region is mmaped (no KVM TRAP) >>> + * false ~ slow path = MMIO region is trapped and region callbacks >>> + * are called slow path enables to trap the IRQ status register >>> + * guest reset >>> +*/ >>> + >>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled) >>> +{ >>> + VFIORegion *region; >>> + int i; >>> + >>> + trace_vfio_platform_mmap_set_enabled(enabled); >>> + >>> + for (i = 0; i < vdev->vbasedev.num_regions; i++) { >>> + region = vdev->regions[i]; >>> + >>> + /* register space is unmapped to trap EOI */ >>> + memory_region_set_enabled(®ion->mmap_mem, enabled); >>> + } >>> +} >>> + >>> +/** >>> + * vfio_intp_mmap_enable - timer function, restores the fast path >>> + * if there is no more active IRQ >>> + * @opaque: actually points to the VFIO platform device >>> + * >>> + * Called on mmap timer timout, this function checks whether the >>> + * IRQ is still active and in the negative restores the fast path. >>> + * by construction a single eventfd is handled at a time. >>> + * if the IRQ is still active, the timer is restarted. >>> + */ >>> +static void vfio_intp_mmap_enable(void *opaque) >>> +{ >>> + VFIOINTp *tmp; >>> + VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque; >>> + >>> + qemu_mutex_lock(&vdev->intp_mutex); >>> + QLIST_FOREACH(tmp, &vdev->intp_list, next) { >>> + if (tmp->state == VFIO_IRQ_ACTIVE) { >>> + trace_vfio_platform_intp_mmap_enable(tmp->pin); >>> + /* re-program the timer to check active status later */ >>> + timer_mod(vdev->mmap_timer, >>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >>> + vdev->mmap_timeout); >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> + return; >>> + } >>> + } >>> + vfio_mmap_set_enabled(vdev, true); >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> +} >>> + >>> +/** >>> + * vfio_intp_interrupt - The user-side eventfd handler >>> + * @opaque: opaque pointer which in practice is the VFIOINTp* >>> + * >>> + * the function can be entered >>> + * - in event handler context: this IRQ is inactive >>> + * in that case, the vIRQ is injected into the guest if there >>> + * is no other active or pending IRQ. >>> + * - in IOhandler context: this IRQ is pending. >>> + * there is no ACTIVE IRQ >>> + */ >>> +static void vfio_intp_interrupt(VFIOINTp *intp) >>> +{ >>> + int ret; >>> + VFIOINTp *tmp; >>> + VFIOPlatformDevice *vdev = intp->vdev; >>> + bool delay_handling = false; >>> + >>> + qemu_mutex_lock(&vdev->intp_mutex); >>> + if (intp->state == VFIO_IRQ_INACTIVE) { >>> + QLIST_FOREACH(tmp, &vdev->intp_list, next) { >>> + if (tmp->state == VFIO_IRQ_ACTIVE || >>> + tmp->state == VFIO_IRQ_PENDING) { >>> + delay_handling = true; >>> + break; >>> + } >>> + } >>> + } >>> + if (delay_handling) { >>> + /* >>> + * the new IRQ gets a pending status and is pushed in >>> + * the pending queue >>> + */ >>> + intp->state = VFIO_IRQ_PENDING; >>> + trace_vfio_intp_interrupt_set_pending(intp->pin); >>> + QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue, >>> + intp, pqnext); >>> + ret = event_notifier_test_and_clear(&intp->interrupt); >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> + return; >>> + } >>> + >>> + /* no active IRQ, the new IRQ can be forwarded to the guest */ >>> + trace_vfio_platform_intp_interrupt(intp->pin, >>> + event_notifier_get_fd(&intp->interrupt)); >>> + >>> + if (intp->state == VFIO_IRQ_INACTIVE) { >>> + ret = event_notifier_test_and_clear(&intp->interrupt); >>> + if (!ret) { >>> + error_report("Error when clearing fd=%d (ret = %d)\n", >>> + event_notifier_get_fd(&intp->interrupt), ret); >>> + } >>> + } /* else this is a pending IRQ that moves to ACTIVE state */ >>> + >>> + intp->state = VFIO_IRQ_ACTIVE; >>> + >>> + /* sets slow path */ >>> + vfio_mmap_set_enabled(vdev, false); >>> + >>> + /* trigger the virtual IRQ */ >>> + qemu_set_irq(intp->qemuirq, 1); >>> + >>> + /* schedule the mmap timer which will restore mmap path after EOI*/ >>> + if (vdev->mmap_timeout) { >>> + timer_mod(vdev->mmap_timer, >>> + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + >>> + vdev->mmap_timeout); >>> + } >>> + qemu_mutex_unlock(&vdev->intp_mutex); >>> +} >>> + >>> +/** >>> + * vfio_start_eventfd_injection - starts the virtual IRQ injection using >>> + * user-side handled eventfds >>> + * @intp: the IRQ struct pointer >>> + */ >>> + >>> +static int vfio_start_eventfd_injection(VFIOINTp *intp) >>> +{ >>> + int ret; >>> + VFIODevice *vbasedev = &intp->vdev->vbasedev; >>> + >>> + vfio_mask_irqindex(vbasedev, intp->pin); >>> + >>> + ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt); >>> + if (ret) { >>> + error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m"); >>> + vfio_unmask_irqindex(vbasedev, intp->pin); >>> + return ret; >>> + } >>> + vfio_unmask_irqindex(vbasedev, intp->pin); >>> + return 0; >>> +} >>> + >>> +/* >>> + * Functions used whatever the injection method >>> + */ >>> + >>> +/** >>> + * vfio_set_trigger_eventfd - set VFIO eventfd handling >>> + * ie. program the VFIO driver to associates a given IRQ index >>> + * with a fd handler >>> + * >>> + * @intp: IRQ struct pointer >>> + * @handler: handler to be called on eventfd trigger >>> + */ >>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp, >>> + eventfd_user_side_handler_t handler) >>> +{ >>> + VFIODevice *vbasedev = &intp->vdev->vbasedev; >>> + struct vfio_irq_set *irq_set; >>> + int argsz, ret; >>> + int32_t *pfd; >>> + >>> + argsz = sizeof(*irq_set) + sizeof(*pfd); >>> + irq_set = g_malloc0(argsz); >>> + irq_set->argsz = argsz; >>> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER; >>> + irq_set->index = intp->pin; >>> + irq_set->start = 0; >>> + irq_set->count = 1; >>> + pfd = (int32_t *)&irq_set->data; >>> + *pfd = event_notifier_get_fd(&intp->interrupt); >>> + qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp); >>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set); >>> + g_free(irq_set); >>> + if (ret < 0) { >>> + error_report("vfio: Failed to set trigger eventfd: %m"); >>> + qemu_set_fd_handler(*pfd, NULL, NULL, NULL); >>> + } >>> + return ret; >>> +} >>> + >>> +/* not implemented yet */ >>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev) >>> +{ >>> +return false; >>> +} >>> + >>> +/* not implemented yet */ >>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev) >>> +{ >>> +return 0; >>> +} >>> + >>> +/** >>> + * vfio_init_intp - allocate, initialize the IRQ struct pointer >>> + * and add it into the list of IRQ >>> + * @vbasedev: the VFIO device >>> + * @index: VFIO device IRQ index >>> + */ >>> +static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index) >>> +{ >>> + int ret; >>> + VFIOPlatformDevice *vdev = >>> + container_of(vbasedev, VFIOPlatformDevice, vbasedev); >>> + SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev); >>> + VFIOINTp *intp; >>> + >>> + /* allocate and populate a new VFIOINTp structure put in a queue list */ >>> + intp = g_malloc0(sizeof(*intp)); >>> + intp->vdev = vdev; >>> + intp->pin = index; >>> + intp->state = VFIO_IRQ_INACTIVE; >>> + sysbus_init_irq(sbdev, &intp->qemuirq); >>> + >>> + /* Get an eventfd for trigger */ >>> + ret = event_notifier_init(&intp->interrupt, 0); >>> + if (ret) { >>> + g_free(intp); >>> + error_report("vfio: Error: trigger event_notifier_init failed "); >>> + return NULL; >>> + } >>> + >>> + /* store the new intp in qlist */ >>> + QLIST_INSERT_HEAD(&vdev->intp_list, intp, next); >>> + return intp; >>> +} >>> + >>> +/** >>> + * vfio_populate_device - initialize MMIO region and IRQ >>> + * @vbasedev: the VFIO device >>> + * >>> + * query the VFIO device for exposed MMIO regions and IRQ and >>> + * populate the associated fields in the device struct >>> + */ >>> +static int vfio_populate_device(VFIODevice *vbasedev) >>> +{ >>> + struct vfio_irq_info irq = { .argsz = sizeof(irq) }; >>> + struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) }; >>> + VFIOINTp *intp; >>> + int i, ret = 0; >>> + VFIOPlatformDevice *vdev = >>> + container_of(vbasedev, VFIOPlatformDevice, vbasedev); >>> + >>> + vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions); >>> + >>> + for (i = 0; i < vbasedev->num_regions; i++) { >>> + vdev->regions[i] = g_malloc0(sizeof(VFIORegion)); >>> + reg_info.index = i; >>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info); >>> + if (ret) { >>> + error_report("vfio: Error getting region %d info: %m", i); >>> + goto error; >>> + } >>> + vdev->regions[i]->flags = reg_info.flags; >>> + vdev->regions[i]->size = reg_info.size; >>> + vdev->regions[i]->fd_offset = reg_info.offset; >>> + vdev->regions[i]->nr = i; >>> + vdev->regions[i]->vbasedev = vbasedev; >>> + >>> + trace_vfio_platform_populate_regions(vdev->regions[i]->nr, >>> + (unsigned long)vdev->regions[i]->flags, >>> + (unsigned long)vdev->regions[i]->size, >>> + vdev->regions[i]->vbasedev->fd, >>> + (unsigned long)vdev->regions[i]->fd_offset); >>> + } >>> + >>> + vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, >>> + vfio_intp_mmap_enable, vdev); >>> + >>> + QSIMPLEQ_INIT(&vdev->pending_intp_queue); >>> + >>> + for (i = 0; i < vbasedev->num_irqs; i++) { >>> + irq.index = i; >>> + >>> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq); >>> + if (ret) { >>> + error_printf("vfio: error getting device %s irq info", >>> + vbasedev->name); >>> + return ret; >>> + } else { >>> + trace_vfio_platform_populate_interrupts(irq.index, >>> + irq.count, >>> + irq.flags); >>> + intp = vfio_init_intp(vbasedev, irq.index); >>> + if (!intp) { >>> + error_report("vfio: Error installing IRQ %d up", i); >>> + return ret; >>> + } >>> + } >>> + } >>> + return 0; >>> +error: >>> + return ret; >>> +} >>> + >>> +/* >>> + * vfio_start_irq_injection - associates a virtual irq to a >>> + * VFIO IRQ index and start the injection of this IRQ >>> + * @s: SysBus Device >>> + * @index: VFIO IRQ index >>> + * @virq: the virtual IRQ number, aka gsi >>> + * >>> + * this function is called when the device tree is built >>> + */ >>> +static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq) >>> +{ >>> + VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev); >>> + VFIOINTp *intp; >>> + >>> + QLIST_FOREACH(intp, &vdev->intp_list, next) { >>> + if (intp->pin == index) { >>> + intp->virtualID = virq; >>> + vdev->start_irq_fn(intp); >>> + } >>> + } >>> +} >>> + >>> +/* specialized functions ofr VFIO Platform devices */ >>> +static VFIODeviceOps vfio_platform_ops = { >>> + .vfio_compute_needs_reset = vfio_platform_compute_needs_reset, >>> + .vfio_hot_reset_multi = vfio_platform_hot_reset_multi, >>> + .vfio_eoi = vfio_platform_eoi, >>> + .vfio_populate_device = vfio_populate_device, >>> +}; >>> + >>> +/** >>> + * vfio_base_device_init - implements some of the VFIO mechanics >>> + * @vbasedev: the VFIO device >>> + * >>> + * retrieves the group the device belongs to and get the device fd >>> + * returns the VFIO device fd >>> + * precondition: the device name must be initialized >>> + */ >>> +static int vfio_base_device_init(VFIODevice *vbasedev) >>> +{ >>> + VFIOGroup *group; >>> + VFIODevice *vbasedev_iter; >>> + char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name; >>> + ssize_t len; >>> + struct stat st; >>> + int groupid; >>> + int ret; >>> + >>> + /* name must be set prior to the call */ >>> + if (!vbasedev->name) { >>> + return -EINVAL; >>> + } >>> + >>> + /* Check that the host device exists */ >>> + snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/", >>> + vbasedev->name); >>> + >>> + if (stat(path, &st) < 0) { >>> + error_report("vfio: error: no such host device: %s", path); >>> + return -errno; >>> + } >>> + >>> + strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1); >>> + len = readlink(path, iommu_group_path, sizeof(path)); >>> + if (len <= 0 || len >= sizeof(path)) { >>> + error_report("vfio: error no iommu_group for device"); >>> + return len < 0 ? -errno : ENAMETOOLONG; >>> + } >>> + >>> + iommu_group_path[len] = 0; >>> + group_name = basename(iommu_group_path); >>> + >>> + if (sscanf(group_name, "%d", &groupid) != 1) { >>> + error_report("vfio: error reading %s: %m", path); >>> + return -errno; >>> + } >>> + >>> + trace_vfio_platform_base_device_init(vbasedev->name, groupid); >>> + >>> + group = vfio_get_group(groupid, &address_space_memory); >>> + if (!group) { >>> + error_report("vfio: failed to get group %d", groupid); >>> + return -ENOENT; >>> + } >>> + >>> + snprintf(path, sizeof(path), "%s", vbasedev->name); >>> + >>> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { >>> + if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) { >>> + error_report("vfio: error: device %s is already attached", path); >>> + vfio_put_group(group); >>> + return -EBUSY; >>> + } >>> + } >>> + ret = vfio_get_device(group, path, vbasedev); >>> + if (ret) { >>> + error_report("vfio: failed to get device %s", path); >>> + vfio_put_group(group); >>> + } >>> + return ret; >>> +} >>> + >>> +/** >>> + * vfio_map_region - initialize the 2 mr (mmapped on ops) for a >>> + * given index >>> + * @vdev: the VFIO platform device >>> + * @nr: the index of the region >>> + * >>> + * init the top memory region and the mmapped memroy region beneath >>> + * VFIOPlatformDevice is used since VFIODevice is not a QOM Object >>> + * and could not be passed to memory region functions >>> +*/ >>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr) >>> +{ >>> + VFIORegion *region = vdev->regions[nr]; >>> + unsigned size = region->size; >>> + char name[64]; >>> + >>> + if (!size) { >>> + return; >>> + } >>> + >>> + snprintf(name, sizeof(name), "VFIO %s region %d", >>> + vdev->vbasedev.name, nr); >>> + >>> + /* A "slow" read/write mapping underlies all regions */ >>> + memory_region_init_io(®ion->mem, OBJECT(vdev), &vfio_region_ops, >>> + region, name, size); >>> + >>> + strncat(name, " mmap", sizeof(name) - strlen(name) - 1); >>> + >>> + if (vfio_mmap_region(OBJECT(vdev), region, ®ion->mem, >>> + ®ion->mmap_mem, ®ion->mmap, size, 0, name)) { >>> + error_report("%s unsupported. Performance may be slow", name); >>> + } >>> +} >>> + >>> +/** >>> + * vfio_platform_realize - the device realize function >>> + * @dev: device state pointer >>> + * @errp: error >>> + * >>> + * initialize the device, its memory regions and IRQ structures >>> + * IRQ are started separately >>> + */ >>> +static void vfio_platform_realize(DeviceState *dev, Error **errp) >>> +{ >>> + VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev); >>> + SysBusDevice *sbdev = SYS_BUS_DEVICE(dev); >>> + VFIODevice *vbasedev = &vdev->vbasedev; >>> + int i, ret; >>> + >>> + vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM; >>> + vbasedev->ops = &vfio_platform_ops; >>> + vdev->start_irq_fn = vfio_start_eventfd_injection; >>> + >>> + trace_vfio_platform_realize(vbasedev->name, vdev->compat); >>> + >>> + ret = vfio_base_device_init(vbasedev); >>> + if (ret) { >>> + error_setg(errp, "vfio: vfio_base_device_init failed for %s", >>> + vbasedev->name); >>> + return; >>> + } >>> + >>> + for (i = 0; i < vbasedev->num_regions; i++) { >>> + vfio_map_region(vdev, i); >>> + sysbus_init_mmio(sbdev, &vdev->regions[i]->mem); >>> + } >>> +} >>> + >>> +/* >>> + * Mechanics to program/start irq injection on machine init done notifier: >>> + * this is needed since at finalize time, the device IRQ are not yet >>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation >>> + * always is used. Binding to the platform bus IRQ happens on a machine >>> + * init done notifier registered by the machine file. After its execution >>> + * we execute a new notifier that actually starts the injection. When using >>> + * irqfd, programming the injection consists in associating eventfds to >>> + * GSI number,ie. virtual IRQ number >>> + */ >>> + >>> +typedef struct VfioIrqStarterNotifierParams { >>> + unsigned int platform_bus_first_irq; >>> + Notifier notifier; >>> +} VfioIrqStarterNotifierParams; >>> + >>> +typedef struct VfioIrqStartParams { >>> + PlatformBusDevice *pbus; >>> + int platform_bus_first_irq; >>> +} VfioIrqStartParams; >>> + >>> +/* Start injection of IRQ for a specific VFIO device */ >>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque) >>> +{ >>> + int i; >>> + VfioIrqStartParams *p = opaque; >>> + VFIOPlatformDevice *vdev; >>> + VFIODevice *vbasedev; >>> + uint64_t irq_number; >>> + PlatformBusDevice *pbus = p->pbus; >>> + int platform_bus_first_irq = p->platform_bus_first_irq; >>> + >>> + if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) { >>> + vdev = VFIO_PLATFORM_DEVICE(sbdev); >>> + vbasedev = &vdev->vbasedev; >>> + for (i = 0; i < vbasedev->num_irqs; i++) { >>> + irq_number = platform_bus_get_irqn(pbus, sbdev, i) >>> + + platform_bus_first_irq; >>> + vfio_start_irq_injection(sbdev, i, irq_number); >>> + } >>> + } >>> + return 0; >>> +} >>> + >>> +/* loop on all VFIO platform devices and start their IRQ injection */ >>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data) >>> +{ >>> + VfioIrqStarterNotifierParams *p = >>> + container_of(notifier, VfioIrqStarterNotifierParams, notifier); >>> + DeviceState *dev = >>> + qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE); >>> + PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev); >>> + >>> + if (pbus->done_gathering) { >>> + VfioIrqStartParams data = { >>> + .pbus = pbus, >>> + .platform_bus_first_irq = p->platform_bus_first_irq, >>> + }; >>> + >>> + foreach_dynamic_sysbus_device(vfio_irq_starter, &data); >>> + } >>> +} >>> + >>> +/* registers the machine init done notifier that will start VFIO IRQ */ >>> +void vfio_register_irq_starter(int platform_bus_first_irq) >>> +{ >>> + VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1); >>> + >>> + p->platform_bus_first_irq = platform_bus_first_irq; >>> + p->notifier.notify = vfio_irq_starter_notify; >>> + qemu_add_machine_init_done_notifier(&p->notifier); >> >> Could you add a notifier for each device instead? Then the notifier >> would be part of the vfio device struct and not some dangling random >> pointer :). >> >> Of course instead of foreach_dynamic_sysbus_device() you would directly >> know the device you're dealing with and only handle a single device per >> notifier. > > Hi Alex, > > Indeed I can do that and put the foreach in the machine file instead. > This means however more code in virt.c, in the create_platform_bus > function. If Peter agrees with that I will proceed. > > I take the opportunity to ask a question I did not dare to ask yet about > qemu_irq ;-). Wouldn't it make sense to create an accessor to be able to > retrieve the IRQ number (n field). Indeed I currently do some gym to > pass the platform bus first irq and it would be definitively simpler to > directly retrieve n from qemu_irq. Besides I think we also have this > need when setting up irqfd for vhost net to associate the gsi with guest > notifier. No, a qemu_irq object only knows the connection it establishes. The bigger picture of what number it has is bus / machine specific. That's what I added the easy platform_bus_get_irqn() helper for ;). Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm