Alex, Thanks for taking closer look. I'll incorporate all the nits you suggested. On 6/22/2016 3:00 AM, Alex Williamson wrote: > On Mon, 20 Jun 2016 22:01:46 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > ... >> + >> +config MDEV >> + tristate "Mediated device driver framework" >> + depends on VFIO >> + default n >> + help >> + MDEV provides a framework to virtualize device without SR-IOV cap >> + See Documentation/mdev.txt for more details. > > Documentation pointer still doesn't exist. Perhaps this file would be > a more appropriate place than the commit log for some of the > information above. > Sure, I'll add these details to documentation. > Every time I review this I'm struggling to figure out why this isn't > VFIO_MDEV since it's really tied to vfio and difficult to evaluate it > as some sort of standalone mediated device interface. I don't know > the answer, but it always strikes me as a discontinuity. > Ok. I'll change to VFIO_MDEV >> + >> +static struct mdev_device *find_mdev_device(struct parent_device *parent, >> + uuid_le uuid, int instance) >> +{ >> + struct mdev_device *mdev = NULL, *p; >> + >> + list_for_each_entry(p, &parent->mdev_list, next) { >> + if ((uuid_le_cmp(p->uuid, uuid) == 0) && >> + (p->instance == instance)) { >> + mdev = p; > > Locking here is still broken, the callers are create and destroy, which > can still race each other and themselves. > Fixed it. >> + >> +static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params) >> +{ >> + struct parent_device *parent = mdev->parent; >> + int ret; >> + >> + mutex_lock(&parent->ops_lock); >> + if (parent->ops->create) { > > How would a parent_device without ops->create or ops->destroy useful? > Perhaps mdev_register_driver() should enforce required ops. mdev.h > should at least document which ops are optional if they really are > optional. Makes sense, adding check in mdev_register_driver() to mandate create and destroy in ops. I'll also update the comments in mdev.h for mandatory and optional ops. > >> + ret = parent->ops->create(mdev->dev.parent, mdev->uuid, >> + mdev->instance, mdev_params); >> + if (ret) >> + goto create_ops_err; >> + } >> + >> + ret = mdev_add_attribute_group(&mdev->dev, >> + parent->ops->mdev_attr_groups); > > An error here seems to put us in a bad place, the device is created but > the attributes are broken, is it the caller's responsibility to > destroy? Seems like we need a cleanup if this fails. > Right, adding cleanup here. >> +create_ops_err: >> + mutex_unlock(&parent->ops_lock); > > It seems like ops_lock isn't used so much as a lock as a serialization > mechanism. Why? Where is this serialization per parent device > documented? > parent->ops_lock is to serialize parent device callbacks to vendor driver, i.e supported_config(), create() and destroy(). mdev->ops_lock is to serialize mediated device related callbacks to vendor driver, i.e. start(), stop(), read(), write(), set_irqs(), get_region_info(), validate_map_request(). Its not documented, I'll add comments to mdev.h about these locks. >> + return ret; >> +} >> + >> +static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force) >> +{ >> + struct parent_device *parent = mdev->parent; >> + int ret = 0; >> + >> + /* >> + * If vendor driver doesn't return success that means vendor >> + * driver doesn't support hot-unplug >> + */ >> + mutex_lock(&parent->ops_lock); >> + if (parent->ops->destroy) { >> + ret = parent->ops->destroy(parent->dev, mdev->uuid, >> + mdev->instance); >> + if (ret && !force) { > > It seems this is not so much a 'force' but an ignore errors, we never > actually force the mdev driver to destroy the device... which makes me > wonder if there are leaks there. > Consider a case where VM is running or in teardown path and parent device in unbound from vendor driver, then vendor driver would call mdev_unregister_device() from its remove() call. Even if parent->ops->destroy() returns error that could also mean that hot-unplug is not supported but we have to destroy mdev device. remove() call doesn't honor error returned. In that case its a force removal. >> + >> +/* >> + * mdev_unregister_device : Unregister a parent device >> + * @dev: device structure representing parent device. >> + * >> + * Remove device from list of registered parent devices. Give a chance to free >> + * existing mediated devices for given device. >> + */ >> + >> +void mdev_unregister_device(struct device *dev) >> +{ >> + struct parent_device *parent; >> + struct mdev_device *mdev, *n; >> + int ret; >> + >> + mutex_lock(&parent_devices.list_lock); >> + parent = find_parent_device(dev); >> + >> + if (!parent) { >> + mutex_unlock(&parent_devices.list_lock); >> + return; >> + } >> + dev_info(dev, "MDEV: Unregistering\n"); >> + >> + /* >> + * Remove parent from the list and remove create and destroy sysfs >> + * files so that no new mediated device could be created for this parent >> + */ >> + list_del(&parent->next); >> + mdev_remove_sysfs_files(dev); >> + mutex_unlock(&parent_devices.list_lock); >> + >> + mutex_lock(&parent->ops_lock); >> + mdev_remove_attribute_group(dev, >> + parent->ops->dev_attr_groups); >> + mutex_unlock(&parent->ops_lock); >> + >> + mutex_lock(&parent->mdev_list_lock); >> + list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) { >> + mdev_device_destroy_ops(mdev, true); >> + list_del(&mdev->next); >> + mdev_put_device(mdev); >> + } >> + mutex_unlock(&parent->mdev_list_lock); >> + >> + do { >> + ret = wait_event_interruptible_timeout(parent->release_done, >> + list_empty(&parent->mdev_list), HZ * 10); > > But we do a list_del for each mdev in mdev_list above, how could the > list not be empty here? I think you're trying to wait for all the mdev > devices to be released, but I don't think this does that. Isn't the > list empty regardless? > Right, I do want to wait for all the mdev devices to be released. Moving list_del(&mdev->next) from the above for loop to mdev_release_device() so that mdev will be removed from list on last mdev_put_device(). >> + if (ret == -ERESTARTSYS) { >> + dev_warn(dev, "Mediated devices are in use, task" >> + " \"%s\" (%d) " >> + "blocked until all are released", >> + current->comm, task_pid_nr(current)); >> + } >> + } while (ret <= 0); >> + >> + mdev_put_parent(parent); >> +} >> +EXPORT_SYMBOL(mdev_unregister_device); >> + >> + >> +int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance, >> + char *mdev_params) >> +{ >> + int ret; >> + struct mdev_device *mdev; >> + struct parent_device *parent; >> + >> + parent = mdev_get_parent_by_dev(dev); >> + if (!parent) >> + return -EINVAL; >> + >> + /* Check for duplicate */ >> + mdev = find_mdev_device(parent, uuid, instance); > > But this doesn't actually prevent duplicates because we we're not > holding any lock the guarantee that another racing process doesn't > create the same {uuid,instance} between where we check and the below > list_add. > Oops I missed this race condition. Moving mutex_lock(&parent->mdev_list_lock); before find_mdev_device() in mdev_device_create() and mdev_device_destroy(). >> + >> +int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance, >> + char *mdev_params); >> +int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance); >> +void mdev_device_supported_config(struct device *dev, char *str); >> +int mdev_device_start(uuid_le uuid); >> +int mdev_device_shutdown(uuid_le uuid); > > nit, stop is start as startup is to shutdown. IOW, should this be > mdev_device_stop()? > Ok. Renaming mdev_device_shutdown() to mdev_device_stop(). >> + >> +struct pci_region_info { >> + uint64_t start; >> + uint64_t size; >> + uint32_t flags; /* VFIO region info flags */ >> +}; >> + >> +enum mdev_emul_space { >> + EMUL_CONFIG_SPACE, /* PCI configuration space */ >> + EMUL_IO, /* I/O register space */ >> + EMUL_MMIO /* Memory-mapped I/O space */ >> +}; > > > I'm still confused why this is needed, perhaps a description here would > be useful so I can stop asking. Clearly config space is PCI only, so > it's strange to have it in the common code. Everyone not on x86 will > say I/O space is also strange. I can't keep it in my head why the > read/write offsets aren't sufficient for the driver to figure out what > type it is. > > Now that VFIO_PCI_OFFSET_* macros are moved to vfio.h which vendor driver can also use, above enum could be removed from read/write. But again these macros are useful when parent device is PCI device. How would non-pci parent device differentiate IO ports and MMIO? >> + int (*get_region_info)(struct mdev_device *vdev, int region_index, >> + struct pci_region_info *region_info); > > This can't be //pci_//region_info. How do you intend to support things > like sparse mmap capabilities in the user REGION_INFO ioctl when such > things are not part of the mediated device API? Seems like the driver > should just return a buffer. > If not pci_region_info, can use vfio_region_info here, even to fetch sparce mmap capabilities from vendor driver? Thanks, Kirti. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html