Re: [PATCH 1/3] Mediated device Core driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex,

Thanks for taking closer look. I'll incorporate all the nits you suggested.

On 6/22/2016 3:00 AM, Alex Williamson wrote:
> On Mon, 20 Jun 2016 22:01:46 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>
...
>> +
>> +config MDEV
>> +    tristate "Mediated device driver framework"
>> +    depends on VFIO
>> +    default n
>> +    help
>> +        MDEV provides a framework to virtualize device without SR-IOV cap
>> +        See Documentation/mdev.txt for more details.
> 
> Documentation pointer still doesn't exist.  Perhaps this file would be
> a more appropriate place than the commit log for some of the
> information above.
> 

Sure, I'll add these details to documentation.

> Every time I review this I'm struggling to figure out why this isn't
> VFIO_MDEV since it's really tied to vfio and difficult to evaluate it
> as some sort of standalone mediated device interface.  I don't know
> the answer, but it always strikes me as a discontinuity.
> 

Ok. I'll change to VFIO_MDEV

>> +
>> +static struct mdev_device *find_mdev_device(struct parent_device *parent,
>> +					    uuid_le uuid, int instance)
>> +{
>> +	struct mdev_device *mdev = NULL, *p;
>> +
>> +	list_for_each_entry(p, &parent->mdev_list, next) {
>> +		if ((uuid_le_cmp(p->uuid, uuid) == 0) &&
>> +		    (p->instance == instance)) {
>> +			mdev = p;
> 
> Locking here is still broken, the callers are create and destroy, which
> can still race each other and themselves.
>

Fixed it.

>> +
>> +static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
>> +{
>> +	struct parent_device *parent = mdev->parent;
>> +	int ret;
>> +
>> +	mutex_lock(&parent->ops_lock);
>> +	if (parent->ops->create) {
> 
> How would a parent_device without ops->create or ops->destroy useful?
> Perhaps mdev_register_driver() should enforce required ops.  mdev.h
> should at least document which ops are optional if they really are
> optional.

Makes sense, adding check in mdev_register_driver() to mandate create
and destroy in ops. I'll also update the comments in mdev.h for
mandatory and optional ops.

> 
>> +		ret = parent->ops->create(mdev->dev.parent, mdev->uuid,
>> +					mdev->instance, mdev_params);
>> +		if (ret)
>> +			goto create_ops_err;
>> +	}
>> +
>> +	ret = mdev_add_attribute_group(&mdev->dev,
>> +					parent->ops->mdev_attr_groups);
> 
> An error here seems to put us in a bad place, the device is created but
> the attributes are broken, is it the caller's responsibility to
> destroy?  Seems like we need a cleanup if this fails.
> 

Right, adding cleanup here.

>> +create_ops_err:
>> +	mutex_unlock(&parent->ops_lock);
> 
> It seems like ops_lock isn't used so much as a lock as a serialization
> mechanism.  Why?  Where is this serialization per parent device
> documented?
>

parent->ops_lock is to serialize parent device callbacks to vendor
driver, i.e supported_config(), create() and destroy().
mdev->ops_lock is to serialize mediated device related callbacks to
vendor driver, i.e. start(), stop(), read(), write(), set_irqs(),
get_region_info(), validate_map_request().
Its not documented, I'll add comments to mdev.h about these locks.


>> +	return ret;
>> +}
>> +
>> +static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
>> +{
>> +	struct parent_device *parent = mdev->parent;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * If vendor driver doesn't return success that means vendor
>> +	 * driver doesn't support hot-unplug
>> +	 */
>> +	mutex_lock(&parent->ops_lock);
>> +	if (parent->ops->destroy) {
>> +		ret = parent->ops->destroy(parent->dev, mdev->uuid,
>> +					   mdev->instance);
>> +		if (ret && !force) {
> 
> It seems this is not so much a 'force' but an ignore errors, we never
> actually force the mdev driver to destroy the device... which makes me
> wonder if there are leaks there.
> 

Consider a case where VM is running or in teardown path and parent
device in unbound from vendor driver, then vendor driver would call
mdev_unregister_device() from its remove() call. Even if
parent->ops->destroy() returns error that could also mean that
hot-unplug is not supported but we have to destroy mdev device. remove()
call doesn't honor error returned. In that case its a force removal.

>> +
>> +/*
>> + * mdev_unregister_device : Unregister a parent device
>> + * @dev: device structure representing parent device.
>> + *
>> + * Remove device from list of registered parent devices. Give a chance to free
>> + * existing mediated devices for given device.
>> + */
>> +
>> +void mdev_unregister_device(struct device *dev)
>> +{
>> +	struct parent_device *parent;
>> +	struct mdev_device *mdev, *n;
>> +	int ret;
>> +
>> +	mutex_lock(&parent_devices.list_lock);
>> +	parent = find_parent_device(dev);
>> +
>> +	if (!parent) {
>> +		mutex_unlock(&parent_devices.list_lock);
>> +		return;
>> +	}
>> +	dev_info(dev, "MDEV: Unregistering\n");
>> +
>> +	/*
>> +	 * Remove parent from the list and remove create and destroy sysfs
>> +	 * files so that no new mediated device could be created for this parent
>> +	 */
>> +	list_del(&parent->next);
>> +	mdev_remove_sysfs_files(dev);
>> +	mutex_unlock(&parent_devices.list_lock);
>> +
>> +	mutex_lock(&parent->ops_lock);
>> +	mdev_remove_attribute_group(dev,
>> +				    parent->ops->dev_attr_groups);
>> +	mutex_unlock(&parent->ops_lock);
>> +
>> +	mutex_lock(&parent->mdev_list_lock);
>> +	list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) {
>> +		mdev_device_destroy_ops(mdev, true);
>> +		list_del(&mdev->next);
>> +		mdev_put_device(mdev);
>> +	}
>> +	mutex_unlock(&parent->mdev_list_lock);
>> +
>> +	do {
>> +		ret = wait_event_interruptible_timeout(parent->release_done,
>> +				list_empty(&parent->mdev_list), HZ * 10);
> 
> But we do a list_del for each mdev in mdev_list above, how could the
> list not be empty here?  I think you're trying to wait for all the mdev
> devices to be released, but I don't think this does that.  Isn't the
> list empty regardless?
>

Right, I do want to wait for all the mdev devices to be released. Moving
list_del(&mdev->next) from the above for loop to mdev_release_device()
so that mdev will be removed from list on last mdev_put_device().


>> +		if (ret == -ERESTARTSYS) {
>> +			dev_warn(dev, "Mediated devices are in use, task"
>> +				      " \"%s\" (%d) "
>> +				      "blocked until all are released",
>> +				      current->comm, task_pid_nr(current));
>> +		}
>> +	} while (ret <= 0);
>> +
>> +	mdev_put_parent(parent);
>> +}
>> +EXPORT_SYMBOL(mdev_unregister_device);
>> +
>> +


>> +int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
>> +		       char *mdev_params)
>> +{
>> +	int ret;
>> +	struct mdev_device *mdev;
>> +	struct parent_device *parent;
>> +
>> +	parent = mdev_get_parent_by_dev(dev);
>> +	if (!parent)
>> +		return -EINVAL;
>> +
>> +	/* Check for duplicate */
>> +	mdev = find_mdev_device(parent, uuid, instance);
> 
> But this doesn't actually prevent duplicates because we we're not
> holding any lock the guarantee that another racing process doesn't
> create the same {uuid,instance} between where we check and the below
> list_add.
> 

Oops I missed this race condition. Moving
mutex_lock(&parent->mdev_list_lock);
before find_mdev_device() in mdev_device_create() and
mdev_device_destroy().


>> +
>> +int  mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
>> +			char *mdev_params);
>> +int  mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance);
>> +void mdev_device_supported_config(struct device *dev, char *str);
>> +int  mdev_device_start(uuid_le uuid);
>> +int  mdev_device_shutdown(uuid_le uuid);
> 
> nit, stop is start as startup is to shutdown.  IOW, should this be
> mdev_device_stop()?
>

Ok. Renaming mdev_device_shutdown() to mdev_device_stop().


>> +
>> +struct pci_region_info {
>> +	uint64_t start;
>> +	uint64_t size;
>> +	uint32_t flags;		/* VFIO region info flags */
>> +};
>> +
>> +enum mdev_emul_space {
>> +	EMUL_CONFIG_SPACE,	/* PCI configuration space */
>> +	EMUL_IO,		/* I/O register space */
>> +	EMUL_MMIO		/* Memory-mapped I/O space */
>> +};
> 
> 
> I'm still confused why this is needed, perhaps a description here would
> be useful so I can stop asking.  Clearly config space is PCI only, so
> it's strange to have it in the common code.  Everyone not on x86 will
> say I/O space is also strange.  I can't keep it in my head why the
> read/write offsets aren't sufficient for the driver to figure out what
> type it is.
> 
>

Now that VFIO_PCI_OFFSET_* macros are moved to vfio.h which vendor
driver can also use, above enum could be removed from read/write. But
again these macros are useful when parent device is PCI device. How
would non-pci parent device differentiate IO ports and MMIO?


>> +	int	(*get_region_info)(struct mdev_device *vdev, int region_index,
>> +				 struct pci_region_info *region_info);
> 
> This can't be //pci_//region_info.  How do you intend to support things
> like sparse mmap capabilities in the user REGION_INFO ioctl when such
> things are not part of the mediated device API?  Seems like the driver
> should just return a buffer.
>

If not pci_region_info, can use vfio_region_info here, even to fetch
sparce mmap capabilities from vendor driver?

Thanks,
Kirti.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux