Re: [PATCH v6 1/4] vfio: Mediated device Core driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Aug 2016 00:33:51 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---------------+
>  |               |
>  | +-----------+ |  mdev_register_driver() +--------------+
>  | |           | +<------------------------+ __init()     |
>  | |           | |                         |              |
>  | |  mdev     | +------------------------>+              |<-> VFIO user
>  | |  bus      | |     probe()/remove()    | vfio_mpci.ko |    APIs
>  | |  driver   | |                         |              |
>  | |           | |                         +--------------+
>  | |           | |  mdev_register_driver() +--------------+
>  | |           | +<------------------------+ __init()     |
>  | |           | |                         |              |
>  | |           | +------------------------>+              |<-> VFIO user
>  | +-----------+ |     probe()/remove()    | vfio_mccw.ko |    APIs
>  |               |                         |              |
>  |  MDEV CORE    |                         +--------------+
>  |   MODULE      |
>  |   mdev.ko     |
>  | +-----------+ |  mdev_register_device() +--------------+
>  | |           | +<------------------------+              |
>  | |           | |                         |  nvidia.ko   |<-> physical
>  | |           | +------------------------>+              |    device
>  | |           | |        callback         +--------------+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--------------+
>  | | interface | |<------------------------+              |
>  | |           | |                         |  i915.ko     |<-> physical
>  | |           | +------------------------>+              |    device
>  | |           | |        callback         +--------------+
>  | |           | |
>  | |           | |  mdev_register_device() +--------------+
>  | |           | +<------------------------+              |
>  | |           | |                         | ccw_device.ko|<-> physical
>  | |           | +------------------------>+              |    device
>  | |           | |        callback         +--------------+
>  | +-----------+ |
>  +---------------+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
> 	    Return 1 if given device can be handled by given driver and
> 	    zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>          const char *name;
>          int  (*probe)  (struct device *dev);
>          void (*remove) (struct device *dev);
>          int  (*match)(struct device *dev);
>          struct device_driver    driver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
> 		    driver
> - create: to allocate basic resources in vendor driver for a mediated
> 	  device.
> - destroy: to free resources in vendor driver when mediated device is
> 	   destroyed.
> - reset: to free and reallocate resources in vendor driver during reboot
> - start: to initiate mediated device initialization process from vendor
> 	 driver
> - shutdown: to teardown mediated device resources during teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that VMM sets.
> - get_region_info: to provide region size and its flags for the mediated
> 		   device.
> - validate_map_request: to validate remap pfn request.
> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> Locks to serialize above callbacks are removed. If required, vendor driver
> can have locks to serialize above APIs in their driver.
> 
> Added support to keep track of physical mappings for each mdev device.
> APIs to be used by mediated device bus driver to add and delete mappings to
> tracking logic:
> int mdev_add_phys_mapping(struct mdev_device *mdev,
>                           struct address_space *mapping,
>                           unsigned long addr, unsigned long size)
> void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr)
> 
> API to be used by vendor driver to invalidate mapping:
> int mdev_device_invalidate_mapping(struct mdev_device *mdev,
>                                    unsigned long addr, unsigned long size)
> 
> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> Signed-off-by: Neo Jia <cjia@xxxxxxxxxx>
> Change-Id: I73a5084574270b14541c529461ea2f03c292d510
> ---
>  drivers/vfio/Kconfig             |   1 +
>  drivers/vfio/Makefile            |   1 +
>  drivers/vfio/mdev/Kconfig        |  12 +
>  drivers/vfio/mdev/Makefile       |   5 +
>  drivers/vfio/mdev/mdev_core.c    | 676 +++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/mdev/mdev_driver.c  | 142 ++++++++
>  drivers/vfio/mdev/mdev_private.h |  33 ++
>  drivers/vfio/mdev/mdev_sysfs.c   | 269 ++++++++++++++++
>  include/linux/mdev.h             | 236 ++++++++++++++
>  9 files changed, 1375 insertions(+)
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 include/linux/mdev.h
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index da6e2ce77495..23eced02aaf6 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -48,4 +48,5 @@ menuconfig VFIO_NOIOMMU
>  
>  source "drivers/vfio/pci/Kconfig"
>  source "drivers/vfio/platform/Kconfig"
> +source "drivers/vfio/mdev/Kconfig"
>  source "virt/lib/Kconfig"
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 7b8a31f63fea..4a23c13b6be4 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
>  obj-$(CONFIG_VFIO_PCI) += pci/
>  obj-$(CONFIG_VFIO_PLATFORM) += platform/
> +obj-$(CONFIG_VFIO_MDEV) += mdev/
> diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
> new file mode 100644
> index 000000000000..a34fbc66f92f
> --- /dev/null
> +++ b/drivers/vfio/mdev/Kconfig
> @@ -0,0 +1,12 @@
> +
> +config VFIO_MDEV
> +    tristate "Mediated device driver framework"
> +    depends on VFIO
> +    default n
> +    help
> +        Provides a framework to virtualize device.
> +	See Documentation/vfio-mediated-device.txt for more details.
> +
> +        If you don't know what do here, say N.
> +
> +
> diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
> new file mode 100644
> index 000000000000..56a75e689582
> --- /dev/null
> +++ b/drivers/vfio/mdev/Makefile
> @@ -0,0 +1,5 @@
> +
> +mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
> +
> +obj-$(CONFIG_VFIO_MDEV) += mdev.o
> +
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> new file mode 100644
> index 000000000000..90ff073abfce
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -0,0 +1,676 @@
> +/*
> + * Mediated device Core Driver
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + *     Author: Neo Jia <cjia@xxxxxxxxxx>
> + *	       Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/sched.h>
> +#include <linux/uuid.h>
> +#include <linux/vfio.h>
> +#include <linux/iommu.h>
> +#include <linux/sysfs.h>
> +#include <linux/mdev.h>
> +
> +#include "mdev_private.h"
> +
> +#define DRIVER_VERSION		"0.1"
> +#define DRIVER_AUTHOR		"NVIDIA Corporation"
> +#define DRIVER_DESC		"Mediated device Core Driver"
> +
> +#define MDEV_CLASS_NAME		"mdev"
> +
> +static LIST_HEAD(parent_list);
> +static DEFINE_MUTEX(parent_list_lock);
> +
> +static int mdev_add_attribute_group(struct device *dev,
> +				    const struct attribute_group **groups)
> +{
> +	return sysfs_create_groups(&dev->kobj, groups);
> +}
> +
> +static void mdev_remove_attribute_group(struct device *dev,
> +					const struct attribute_group **groups)
> +{
> +	sysfs_remove_groups(&dev->kobj, groups);
> +}
> +
> +/* Should be called holding parent->mdev_list_lock */

I often like to prepend "__" onto the name of functions like this to
signal a special calling convention.

> +static struct mdev_device *find_mdev_device(struct parent_device *parent,
> +					    uuid_le uuid, int instance)
> +{
> +	struct mdev_device *mdev;
> +
> +	list_for_each_entry(mdev, &parent->mdev_list, next) {
> +		if ((uuid_le_cmp(mdev->uuid, uuid) == 0) &&
> +		    (mdev->instance == instance))
> +			return mdev;
> +	}
> +	return NULL;
> +}
> +
> +/* Should be called holding parent_list_lock */
> +static struct parent_device *find_parent_device(struct device *dev)
> +{
> +	struct parent_device *parent;
> +
> +	list_for_each_entry(parent, &parent_list, next) {
> +		if (parent->dev == dev)
> +			return parent;
> +	}
> +	return NULL;
> +}
> +
> +static void mdev_release_parent(struct kref *kref)
> +{
> +	struct parent_device *parent = container_of(kref, struct parent_device,
> +						    ref);
> +	kfree(parent);
> +}
> +
> +static
> +inline struct parent_device *mdev_get_parent(struct parent_device *parent)
> +{
> +	if (parent)
> +		kref_get(&parent->ref);
> +
> +	return parent;
> +}
> +
> +static inline void mdev_put_parent(struct parent_device *parent)
> +{
> +	if (parent)
> +		kref_put(&parent->ref, mdev_release_parent);
> +}
> +
> +static struct parent_device *mdev_get_parent_by_dev(struct device *dev)
> +{
> +	struct parent_device *parent = NULL, *p;
> +
> +	mutex_lock(&parent_list_lock);
> +	list_for_each_entry(p, &parent_list, next) {
> +		if (p->dev == dev) {
> +			parent = mdev_get_parent(p);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&parent_list_lock);
> +	return parent;

Use what you've created:

{
	struct parent_device *parent;

	mutex_lock(&parent_list_lock);
	parent = mdev_get_parent(find_parent_device(dev));
	mutex_unlock(&parent_list_lock);

	return parent;
}

> +}
> +
> +static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
> +{
> +	struct parent_device *parent = mdev->parent;
> +	int ret;
> +
> +	ret = parent->ops->create(mdev, mdev_params);
> +	if (ret)
> +		return ret;
> +
> +	ret = mdev_add_attribute_group(&mdev->dev,
> +					parent->ops->mdev_attr_groups);
> +	if (ret)
> +		parent->ops->destroy(mdev);
> +
> +	return ret;
> +}
> +
> +static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
> +{
> +	struct parent_device *parent = mdev->parent;
> +	int ret = 0;
> +
> +	/*
> +	 * If vendor driver doesn't return success that means vendor
> +	 * driver doesn't support hot-unplug
> +	 */
> +	ret = parent->ops->destroy(mdev);
> +	if (ret && !force)
> +		return -EBUSY;

This still seems troublesome, I'm not sure why we don't just require
hot-unplug support.  Without it, we seem to have a limbo state where a
device exists, but not fully.

> +
> +	mdev_remove_attribute_group(&mdev->dev,
> +				    parent->ops->mdev_attr_groups);
> +
> +	return ret;
> +}
> +
> +static void mdev_release_device(struct kref *kref)
> +{
> +	struct mdev_device *mdev = container_of(kref, struct mdev_device, ref);
> +	struct parent_device *parent = mdev->parent;
> +
> +	list_del(&mdev->next);
> +	mutex_unlock(&parent->mdev_list_lock);

Maybe worthy of a short comment to more obviously match this unlock to
the kref_put_mutex() below.

> +
> +	device_unregister(&mdev->dev);
> +	wake_up(&parent->release_done);
> +	mdev_put_parent(parent);
> +}
> +
> +struct mdev_device *mdev_get_device(struct mdev_device *mdev)
> +{
> +	kref_get(&mdev->ref);
> +	return mdev;
> +}
> +EXPORT_SYMBOL(mdev_get_device);

Is the intention here that the caller already has a reference to mdev
and wants to get another?  Or I see the cases where use locking
to get this reference.  There's potential to misuse this, if not
outright abuse it, which worries me.  A reference cannot be
spontaneously generated, it needs to be sourced from somewhere.

> +
> +void mdev_put_device(struct mdev_device *mdev)
> +{
> +	struct parent_device *parent = mdev->parent;
> +
> +	kref_put_mutex(&mdev->ref, mdev_release_device,
> +		       &parent->mdev_list_lock);
> +}
> +EXPORT_SYMBOL(mdev_put_device);
> +
> +/*
> + * Find first mediated device from given uuid and increment refcount of
> + * mediated device. Caller should call mdev_put_device() when the use of
> + * mdev_device is done.
> + */
> +static struct mdev_device *mdev_get_first_device_by_uuid(uuid_le uuid)
> +{
> +	struct mdev_device *mdev = NULL, *p;
> +	struct parent_device *parent;
> +
> +	mutex_lock(&parent_list_lock);
> +	list_for_each_entry(parent, &parent_list, next) {
> +		mutex_lock(&parent->mdev_list_lock);
> +		list_for_each_entry(p, &parent->mdev_list, next) {
> +			if (uuid_le_cmp(p->uuid, uuid) == 0) {
> +				mdev = mdev_get_device(p);
> +				break;
> +			}
> +		}
> +		mutex_unlock(&parent->mdev_list_lock);
> +
> +		if (mdev)
> +			break;
> +	}
> +	mutex_unlock(&parent_list_lock);
> +	return mdev;
> +}

This is used later by mdev_device_start() and mdev_device_stop() to get
the parent_device so it can call the start and stop ops callbacks
respectively.  That seems to imply that all of instances for a given
uuid come from the same parent_device.  Where is that enforced?  I'm
still having a hard time buying into the uuid+instance plan when it
seems like each mdev_device should have an actual unique uuid.
Userspace tools can figure out which uuids to start for a given user, I
don't see much value in collecting them to instances within a uuid.

> +
> +/*
> + * Find mediated device from given iommu_group and increment refcount of
> + * mediated device. Caller should call mdev_put_device() when the use of
> + * mdev_device is done.
> + */
> +struct mdev_device *mdev_get_device_by_group(struct iommu_group *group)
> +{
> +	struct mdev_device *mdev = NULL, *p;
> +	struct parent_device *parent;
> +
> +	mutex_lock(&parent_list_lock);
> +	list_for_each_entry(parent, &parent_list, next) {
> +		mutex_lock(&parent->mdev_list_lock);
> +		list_for_each_entry(p, &parent->mdev_list, next) {
> +			if (!p->group)
> +				continue;
> +
> +			if (iommu_group_id(p->group) == iommu_group_id(group)) {
> +				mdev = mdev_get_device(p);
> +				break;
> +			}
> +		}
> +		mutex_unlock(&parent->mdev_list_lock);
> +
> +		if (mdev)
> +			break;
> +	}
> +	mutex_unlock(&parent_list_lock);
> +	return mdev;
> +}
> +EXPORT_SYMBOL(mdev_get_device_by_group);
> +
> +/*
> + * mdev_register_device : Register a device
> + * @dev: device structure representing parent device.
> + * @ops: Parent device operation structure to be registered.
> + *
> + * Add device to list of registered parent devices.
> + * Returns a negative value on error, otherwise 0.
> + */
> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
> +{
> +	int ret = 0;
> +	struct parent_device *parent;
> +
> +	if (!dev || !ops)
> +		return -EINVAL;
> +
> +	/* check for mandatory ops */
> +	if (!ops->create || !ops->destroy)
> +		return -EINVAL;
> +
> +	mutex_lock(&parent_list_lock);
> +
> +	/* Check for duplicate */
> +	parent = find_parent_device(dev);
> +	if (parent) {
> +		ret = -EEXIST;
> +		goto add_dev_err;
> +	}
> +
> +	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> +	if (!parent) {
> +		ret = -ENOMEM;
> +		goto add_dev_err;
> +	}
> +
> +	kref_init(&parent->ref);
> +	list_add(&parent->next, &parent_list);
> +
> +	parent->dev = dev;
> +	parent->ops = ops;
> +	mutex_init(&parent->mdev_list_lock);
> +	INIT_LIST_HEAD(&parent->mdev_list);
> +	init_waitqueue_head(&parent->release_done);
> +	mutex_unlock(&parent_list_lock);
> +
> +	ret = mdev_create_sysfs_files(dev);
> +	if (ret)
> +		goto add_sysfs_error;
> +
> +	ret = mdev_add_attribute_group(dev, ops->dev_attr_groups);
> +	if (ret)
> +		goto add_group_error;
> +
> +	dev_info(dev, "MDEV: Registered\n");
> +	return 0;
> +
> +add_group_error:
> +	mdev_remove_sysfs_files(dev);
> +add_sysfs_error:
> +	mutex_lock(&parent_list_lock);
> +	list_del(&parent->next);
> +	mutex_unlock(&parent_list_lock);
> +	mdev_put_parent(parent);
> +	return ret;
> +
> +add_dev_err:
> +	mutex_unlock(&parent_list_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(mdev_register_device);
> +
> +/*
> + * mdev_unregister_device : Unregister a parent device
> + * @dev: device structure representing parent device.
> + *
> + * Remove device from list of registered parent devices. Give a chance to free
> + * existing mediated devices for given device.
> + */
> +
> +void mdev_unregister_device(struct device *dev)
> +{
> +	struct parent_device *parent;
> +	struct mdev_device *mdev, *n;

Above *p was used for a temp pointer.  

> +	int ret;
> +
> +	mutex_lock(&parent_list_lock);
> +	parent = find_parent_device(dev);
> +
> +	if (!parent) {
> +		mutex_unlock(&parent_list_lock);
> +		return;
> +	}
> +	dev_info(dev, "MDEV: Unregistering\n");
> +
> +	/*
> +	 * Remove parent from the list and remove create and destroy sysfs

Quoting "create" and "destroy" would make this a bit more readable:

	Remove parent from the list and remove "create" and "destroy"
	sysfs...

Took me a couple reads to figure out "remove create" wasn't a typo.

> +	 * files so that no new mediated device could be created for this parent
> +	 */
> +	list_del(&parent->next);
> +	mdev_remove_sysfs_files(dev);
> +	mutex_unlock(&parent_list_lock);
> +
> +	mdev_remove_attribute_group(dev,
> +				    parent->ops->dev_attr_groups);
> +

Why do we need to remove sysfs files under the parent_list_lock?

> +	mutex_lock(&parent->mdev_list_lock);
> +	list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) {
> +		mdev_device_destroy_ops(mdev, true);
> +		mutex_unlock(&parent->mdev_list_lock);
> +		mdev_put_device(mdev);
> +		mutex_lock(&parent->mdev_list_lock);

*cringe*  Any time we need to release the list lock inside the
traversal makes me nervous.  What about using list_first_entry() since
I don't think using list_for_each_entry_safe() really makes it safe
from concurrent operations on the list once we drop that lock.

> +	}
> +	mutex_unlock(&parent->mdev_list_lock);
> +
> +	do {
> +		ret = wait_event_interruptible_timeout(parent->release_done,
> +				list_empty(&parent->mdev_list), HZ * 10);
> +		if (ret == -ERESTARTSYS) {
> +			dev_warn(dev, "Mediated devices are in use, task"
> +				      " \"%s\" (%d) "
> +				      "blocked until all are released",
> +				      current->comm, task_pid_nr(current));
> +		}
> +	} while (ret <= 0);
> +
> +	mdev_put_parent(parent);
> +}
> +EXPORT_SYMBOL(mdev_unregister_device);
> +
> +/*
> + * Functions required for mdev_sysfs
> + */
> +static void mdev_device_release(struct device *dev)
> +{
> +	struct mdev_device *mdev = to_mdev_device(dev);
> +
> +	dev_dbg(&mdev->dev, "MDEV: destroying\n");
> +	kfree(mdev);
> +}
> +
> +int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
> +		       char *mdev_params)
> +{
> +	int ret;
> +	struct mdev_device *mdev;
> +	struct parent_device *parent;
> +
> +	parent = mdev_get_parent_by_dev(dev);
> +	if (!parent)
> +		return -EINVAL;
> +
> +	mutex_lock(&parent->mdev_list_lock);
> +	/* Check for duplicate */
> +	mdev = find_mdev_device(parent, uuid, instance);
> +	if (mdev) {
> +		ret = -EEXIST;
> +		goto create_err;
> +	}
> +
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev) {
> +		ret = -ENOMEM;
> +		goto create_err;
> +	}
> +
> +	memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
> +	mdev->instance = instance;
> +	mdev->parent = parent;
> +	kref_init(&mdev->ref);
> +
> +	mdev->dev.parent  = dev;
> +	mdev->dev.bus     = &mdev_bus_type;
> +	mdev->dev.release = mdev_device_release;
> +	dev_set_name(&mdev->dev, "%pUl-%d", uuid.b, instance);
> +
> +	ret = device_register(&mdev->dev);
> +	if (ret) {
> +		put_device(&mdev->dev);
> +		goto create_err;
> +	}
> +
> +	ret = mdev_device_create_ops(mdev, mdev_params);
> +	if (ret)
> +		goto create_failed;
> +
> +	list_add(&mdev->next, &parent->mdev_list);
> +	mutex_unlock(&parent->mdev_list_lock);
> +
> +	dev_dbg(&mdev->dev, "MDEV: created\n");
> +
> +	return ret;
> +
> +create_failed:
> +	device_unregister(&mdev->dev);
> +
> +create_err:
> +	mutex_unlock(&parent->mdev_list_lock);
> +	mdev_put_parent(parent);
> +	return ret;
> +}
> +
> +int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
> +{
> +	struct mdev_device *mdev;
> +	struct parent_device *parent;
> +	int ret;
> +
> +	parent = mdev_get_parent_by_dev(dev);
> +	if (!parent)
> +		return -EINVAL;
> +
> +	mutex_lock(&parent->mdev_list_lock);
> +	mdev = find_mdev_device(parent, uuid, instance);
> +	if (!mdev) {
> +		ret = -EINVAL;

-ENODEV?

> +		goto destroy_err;
> +	}
> +
> +	ret = mdev_device_destroy_ops(mdev, false);
> +	if (ret)
> +		goto destroy_err;
> +
> +	mutex_unlock(&parent->mdev_list_lock);
> +	mdev_put_device(mdev);
> +
> +	mdev_put_parent(parent);
> +	return ret;
> +
> +destroy_err:
> +	mutex_unlock(&parent->mdev_list_lock);
> +	mdev_put_parent(parent);
> +	return ret;
> +}
> +
> +int mdev_device_invalidate_mapping(struct mdev_device *mdev,
> +				   unsigned long addr, unsigned long size)
> +{
> +	int ret = -EINVAL;
> +	struct mdev_phys_mapping *phys_mappings;
> +	struct addr_desc *addr_desc;
> +
> +	if (!mdev || !mdev->phys_mappings.mapping)
> +		return ret;
> +
> +	phys_mappings = &mdev->phys_mappings;
> +
> +	mutex_lock(&phys_mappings->addr_desc_list_lock);
> +
> +	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
> +
> +		if ((addr > addr_desc->start) &&
> +		    (addr + size < addr_desc->start + addr_desc->size)) {

This looks incomplete, minimally I think these should be >= and <=, but
that still only covers fully enclosed invalidation ranges.  Do we need
to support partial invalidations?

> +			unmap_mapping_range(phys_mappings->mapping,
> +					    addr, size, 0);
> +			ret = 0;
> +			goto unlock_exit;

If partial overlaps can occur, we'll need an exhaustive search.

> +		}
> +	}
> +
> +unlock_exit:
> +	mutex_unlock(&phys_mappings->addr_desc_list_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(mdev_device_invalidate_mapping);
> +
> +/* Sanity check for the physical mapping list for mediated device */
> +
> +int mdev_add_phys_mapping(struct mdev_device *mdev,
> +			  struct address_space *mapping,
> +			  unsigned long addr, unsigned long size)
> +{
> +	struct mdev_phys_mapping *phys_mappings;
> +	struct addr_desc *addr_desc, *new_addr_desc;
> +	int ret = 0;
> +
> +	if (!mdev)
> +		return -EINVAL;
> +
> +	phys_mappings = &mdev->phys_mappings;
> +	if (phys_mappings->mapping && (mapping != phys_mappings->mapping))
> +		return -EINVAL;
> +
> +	if (!phys_mappings->mapping) {
> +		phys_mappings->mapping = mapping;
> +		mutex_init(&phys_mappings->addr_desc_list_lock);
> +		INIT_LIST_HEAD(&phys_mappings->addr_desc_list);
> +	}

This looks racy, should we be acquiring the mutex earlier?

> +
> +	mutex_lock(&phys_mappings->addr_desc_list_lock);
> +
> +	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
> +		if ((addr + size < addr_desc->start) ||
> +		    (addr_desc->start + addr_desc->size) < addr)

<= on both, I think

> +			continue;
> +		else {
> +			/* should be no overlap */
> +			ret = -EINVAL;
> +			goto mapping_exit;
> +		}
> +	}
> +
> +	/* add the new entry to the list */
> +	new_addr_desc = kzalloc(sizeof(*new_addr_desc), GFP_KERNEL);
> +
> +	if (!new_addr_desc) {
> +		ret = -ENOMEM;
> +		goto mapping_exit;
> +	}
> +
> +	new_addr_desc->start = addr;
> +	new_addr_desc->size = size;
> +	list_add(&new_addr_desc->next, &phys_mappings->addr_desc_list);
> +
> +mapping_exit:
> +	mutex_unlock(&phys_mappings->addr_desc_list_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(mdev_add_phys_mapping);
> +
> +void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr)
> +{
> +	struct mdev_phys_mapping *phys_mappings;
> +	struct addr_desc *addr_desc;
> +
> +	if (!mdev)
> +		return;
> +
> +	phys_mappings = &mdev->phys_mappings;
> +
> +	mutex_lock(&phys_mappings->addr_desc_list_lock);
> +	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
> +		if (addr_desc->start == addr) {
> +			list_del(&addr_desc->next);
> +			kfree(addr_desc);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&phys_mappings->addr_desc_list_lock);
> +}
> +EXPORT_SYMBOL(mdev_del_phys_mapping);
> +
> +void mdev_device_supported_config(struct device *dev, char *str)
> +{
> +	struct parent_device *parent;
> +
> +	parent = mdev_get_parent_by_dev(dev);
> +
> +	if (parent) {
> +		if (parent->ops->supported_config)
> +			parent->ops->supported_config(parent->dev, str);
> +		mdev_put_parent(parent);
> +	}
> +}
> +
> +int mdev_device_start(uuid_le uuid)
> +{
> +	int ret = 0;
> +	struct mdev_device *mdev;
> +	struct parent_device *parent;
> +
> +	mdev = mdev_get_first_device_by_uuid(uuid);
> +	if (!mdev)
> +		return -EINVAL;
> +
> +	parent = mdev->parent;
> +
> +	if (parent->ops->start)
> +		ret = parent->ops->start(mdev->uuid);

Assumes uuids do not span parent_devices?

> +
> +	if (ret)
> +		pr_err("mdev_start failed  %d\n", ret);
> +	else
> +		kobject_uevent(&mdev->dev.kobj, KOBJ_ONLINE);
> +
> +	mdev_put_device(mdev);
> +
> +	return ret;
> +}
> +
> +int mdev_device_stop(uuid_le uuid)
> +{
> +	int ret = 0;
> +	struct mdev_device *mdev;
> +	struct parent_device *parent;
> +
> +	mdev = mdev_get_first_device_by_uuid(uuid);
> +	if (!mdev)
> +		return -EINVAL;
> +
> +	parent = mdev->parent;
> +
> +	if (parent->ops->stop)
> +		ret = parent->ops->stop(mdev->uuid);
> +
> +	if (ret)
> +		pr_err("mdev stop failed %d\n", ret);
> +	else
> +		kobject_uevent(&mdev->dev.kobj, KOBJ_OFFLINE);
> +
> +	mdev_put_device(mdev);
> +	return ret;
> +}
> +
> +static struct class mdev_class = {
> +	.name		= MDEV_CLASS_NAME,
> +	.owner		= THIS_MODULE,
> +	.class_attrs	= mdev_class_attrs,
> +};
> +
> +static int __init mdev_init(void)
> +{
> +	int ret;
> +
> +	ret = class_register(&mdev_class);
> +	if (ret) {
> +		pr_err("Failed to register mdev class\n");
> +		return ret;
> +	}
> +
> +	ret = mdev_bus_register();
> +	if (ret) {
> +		pr_err("Failed to register mdev bus\n");
> +		class_unregister(&mdev_class);
> +		return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +static void __exit mdev_exit(void)
> +{
> +	mdev_bus_unregister();
> +	class_unregister(&mdev_class);
> +}
> +
> +module_init(mdev_init)
> +module_exit(mdev_exit)
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
> new file mode 100644
> index 000000000000..00680bd06224
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_driver.c
> @@ -0,0 +1,142 @@
> +/*
> + * MDEV driver
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + *     Author: Neo Jia <cjia@xxxxxxxxxx>
> + *	       Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/iommu.h>
> +#include <linux/mdev.h>
> +
> +#include "mdev_private.h"
> +
> +static int mdev_attach_iommu(struct mdev_device *mdev)
> +{
> +	int ret;
> +	struct iommu_group *group;
> +
> +	group = iommu_group_alloc();
> +	if (IS_ERR(group)) {
> +		dev_err(&mdev->dev, "MDEV: failed to allocate group!\n");
> +		return PTR_ERR(group);
> +	}
> +
> +	ret = iommu_group_add_device(group, &mdev->dev);
> +	if (ret) {
> +		dev_err(&mdev->dev, "MDEV: failed to add dev to group!\n");
> +		goto attach_fail;
> +	}
> +
> +	mdev->group = group;
> +
> +	dev_info(&mdev->dev, "MDEV: group_id = %d\n",
> +				 iommu_group_id(group));
> +attach_fail:
> +	iommu_group_put(group);
> +	return ret;
> +}
> +
> +static void mdev_detach_iommu(struct mdev_device *mdev)
> +{
> +	iommu_group_remove_device(&mdev->dev);
> +	mdev->group = NULL;
> +	dev_info(&mdev->dev, "MDEV: detaching iommu\n");
> +}
> +
> +static int mdev_probe(struct device *dev)
> +{
> +	struct mdev_driver *drv = to_mdev_driver(dev->driver);
> +	struct mdev_device *mdev = to_mdev_device(dev);
> +	int ret;
> +
> +	ret = mdev_attach_iommu(mdev);
> +	if (ret) {
> +		dev_err(dev, "Failed to attach IOMMU\n");
> +		return ret;
> +	}
> +
> +	if (drv && drv->probe)
> +		ret = drv->probe(dev);
> +
> +	if (ret)
> +		mdev_detach_iommu(mdev);
> +
> +	return ret;
> +}
> +
> +static int mdev_remove(struct device *dev)
> +{
> +	struct mdev_driver *drv = to_mdev_driver(dev->driver);
> +	struct mdev_device *mdev = to_mdev_device(dev);
> +
> +	if (drv && drv->remove)
> +		drv->remove(dev);
> +
> +	mdev_detach_iommu(mdev);
> +
> +	return 0;
> +}
> +
> +static int mdev_match(struct device *dev, struct device_driver *driver)
> +{
> +	struct mdev_driver *drv = to_mdev_driver(driver);
> +
> +	if (drv && drv->match)
> +		return drv->match(dev);
> +
> +	return 0;
> +}
> +
> +struct bus_type mdev_bus_type = {
> +	.name		= "mdev",
> +	.match		= mdev_match,
> +	.probe		= mdev_probe,
> +	.remove		= mdev_remove,
> +};
> +EXPORT_SYMBOL_GPL(mdev_bus_type);
> +
> +/*
> + * mdev_register_driver - register a new MDEV driver
> + * @drv: the driver to register
> + * @owner: module owner of driver to be registered
> + *
> + * Returns a negative value on error, otherwise 0.
> + */
> +int mdev_register_driver(struct mdev_driver *drv, struct module *owner)
> +{
> +	/* initialize common driver fields */
> +	drv->driver.name = drv->name;
> +	drv->driver.bus = &mdev_bus_type;
> +	drv->driver.owner = owner;
> +
> +	/* register with core */
> +	return driver_register(&drv->driver);
> +}
> +EXPORT_SYMBOL(mdev_register_driver);
> +
> +/*
> + * mdev_unregister_driver - unregister MDEV driver
> + * @drv: the driver to unregister
> + *
> + */
> +void mdev_unregister_driver(struct mdev_driver *drv)
> +{
> +	driver_unregister(&drv->driver);
> +}
> +EXPORT_SYMBOL(mdev_unregister_driver);
> +
> +int mdev_bus_register(void)
> +{
> +	return bus_register(&mdev_bus_type);
> +}
> +
> +void mdev_bus_unregister(void)
> +{
> +	bus_unregister(&mdev_bus_type);
> +}
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> new file mode 100644
> index 000000000000..ee2db61a8091
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -0,0 +1,33 @@
> +/*
> + * Mediated device interal definitions
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + *     Author: Neo Jia <cjia@xxxxxxxxxx>
> + *	       Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef MDEV_PRIVATE_H
> +#define MDEV_PRIVATE_H
> +
> +int  mdev_bus_register(void);
> +void mdev_bus_unregister(void);
> +
> +/* Function prototypes for mdev_sysfs */
> +
> +extern struct class_attribute mdev_class_attrs[];
> +
> +int  mdev_create_sysfs_files(struct device *dev);
> +void mdev_remove_sysfs_files(struct device *dev);
> +
> +int  mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
> +			char *mdev_params);
> +int  mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance);
> +void mdev_device_supported_config(struct device *dev, char *str);
> +int  mdev_device_start(uuid_le uuid);
> +int  mdev_device_stop(uuid_le uuid);
> +
> +#endif /* MDEV_PRIVATE_H */
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> new file mode 100644
> index 000000000000..e0457e68cf78
> --- /dev/null
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -0,0 +1,269 @@
> +/*
> + * File attributes for Mediated devices
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + *     Author: Neo Jia <cjia@xxxxxxxxxx>
> + *	       Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/sysfs.h>
> +#include <linux/ctype.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/uuid.h>
> +#include <linux/mdev.h>
> +
> +#include "mdev_private.h"
> +
> +/* Prototypes */
> +static ssize_t mdev_supported_types_show(struct device *dev,
> +					 struct device_attribute *attr,
> +					 char *buf);
> +static DEVICE_ATTR_RO(mdev_supported_types);
> +
> +static ssize_t mdev_create_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count);
> +static DEVICE_ATTR_WO(mdev_create);
> +
> +static ssize_t mdev_destroy_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t count);
> +static DEVICE_ATTR_WO(mdev_destroy);
> +
> +/* Static functions */
> +
> +
> +#define SUPPORTED_TYPE_BUFFER_LENGTH	4096
> +
> +/* mdev sysfs Functions */
> +static ssize_t mdev_supported_types_show(struct device *dev,
> +					 struct device_attribute *attr,
> +					 char *buf)
> +{
> +	char *str, *ptr;
> +	ssize_t n;
> +
> +	str = kzalloc(sizeof(*str) * SUPPORTED_TYPE_BUFFER_LENGTH, GFP_KERNEL);
> +	if (!str)
> +		return -ENOMEM;
> +
> +	ptr = str;
> +	mdev_device_supported_config(dev, str);
> +
> +	n = sprintf(buf, "%s\n", str);
> +	kfree(ptr);
> +
> +	return n;
> +}
> +
> +static ssize_t mdev_create_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count)
> +{
> +	char *str, *pstr;
> +	char *uuid_str, *instance_str, *mdev_params = NULL, *params = NULL;
> +	uuid_le uuid;
> +	uint32_t instance;
> +	int ret;
> +
> +	pstr = str = kstrndup(buf, count, GFP_KERNEL);
> +
> +	if (!str)
> +		return -ENOMEM;
> +
> +	uuid_str = strsep(&str, ":");
> +	if (!uuid_str) {
> +		pr_err("mdev_create: Empty UUID string %s\n", buf);
> +		ret = -EINVAL;
> +		goto create_error;
> +	}
> +
> +	if (!str) {
> +		pr_err("mdev_create: mdev instance not present %s\n", buf);
> +		ret = -EINVAL;
> +		goto create_error;
> +	}
> +
> +	instance_str = strsep(&str, ":");
> +	if (!instance_str) {
> +		pr_err("mdev_create: Empty instance string %s\n", buf);
> +		ret = -EINVAL;
> +		goto create_error;
> +	}
> +
> +	ret = kstrtouint(instance_str, 0, &instance);
> +	if (ret) {
> +		pr_err("mdev_create: mdev instance parsing error %s\n", buf);
> +		goto create_error;
> +	}
> +
> +	if (str)
> +		params = mdev_params = kstrdup(str, GFP_KERNEL);
> +
> +	ret = uuid_le_to_bin(uuid_str, &uuid);
> +	if (ret) {
> +		pr_err("mdev_create: UUID parse error %s\n", buf);
> +		goto create_error;
> +	}
> +
> +	ret = mdev_device_create(dev, uuid, instance, mdev_params);
> +	if (ret)
> +		pr_err("mdev_create: Failed to create mdev device\n");
> +	else
> +		ret = count;
> +
> +create_error:
> +	kfree(params);
> +	kfree(pstr);
> +	return ret;
> +}
> +
> +static ssize_t mdev_destroy_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t count)
> +{

I wonder if we should just have a "remove" file in sysfs under the
device.

> +	char *uuid_str, *str, *pstr;
> +	uuid_le uuid;
> +	unsigned int instance;
> +	int ret;
> +
> +	str = pstr = kstrndup(buf, count, GFP_KERNEL);
> +
> +	if (!str)
> +		return -ENOMEM;
> +
> +	uuid_str = strsep(&str, ":");
> +	if (!uuid_str) {
> +		pr_err("mdev_destroy: Empty UUID string %s\n", buf);
> +		ret = -EINVAL;
> +		goto destroy_error;
> +	}
> +
> +	if (str == NULL) {
> +		pr_err("mdev_destroy: instance not specified %s\n", buf);
> +		ret = -EINVAL;
> +		goto destroy_error;
> +	}
> +
> +	ret = kstrtouint(str, 0, &instance);
> +	if (ret) {
> +		pr_err("mdev_destroy: instance parsing error %s\n", buf);
> +		goto destroy_error;
> +	}
> +
> +	ret = uuid_le_to_bin(uuid_str, &uuid);
> +	if (ret) {
> +		pr_err("mdev_destroy: UUID parse error  %s\n", buf);
> +		goto destroy_error;
> +	}
> +
> +	ret = mdev_device_destroy(dev, uuid, instance);
> +	if (ret == 0)
> +		ret = count;
> +
> +destroy_error:
> +	kfree(pstr);
> +	return ret;
> +}
> +
> +ssize_t mdev_start_store(struct class *class, struct class_attribute *attr,
> +			 const char *buf, size_t count)
> +{
> +	char *uuid_str, *ptr;
> +	uuid_le uuid;
> +	int ret;
> +
> +	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
> +
> +	if (!uuid_str)
> +		return -ENOMEM;
> +
> +	ret = uuid_le_to_bin(uuid_str, &uuid);
> +	if (ret) {
> +		pr_err("mdev_start: UUID parse error  %s\n", buf);
> +		goto start_error;
> +	}
> +
> +	ret = mdev_device_start(uuid);
> +	if (ret == 0)
> +		ret = count;
> +
> +start_error:
> +	kfree(ptr);
> +	return ret;
> +}
> +
> +ssize_t mdev_stop_store(struct class *class, struct class_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	char *uuid_str, *ptr;
> +	uuid_le uuid;
> +	int ret;
> +
> +	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
> +
> +	if (!uuid_str)
> +		return -ENOMEM;
> +
> +	ret = uuid_le_to_bin(uuid_str, &uuid);
> +	if (ret) {
> +		pr_err("mdev_stop: UUID parse error %s\n", buf);
> +		goto stop_error;
> +	}
> +
> +	ret = mdev_device_stop(uuid);
> +	if (ret == 0)
> +		ret = count;
> +
> +stop_error:
> +	kfree(ptr);
> +	return ret;
> +
> +}
> +
> +struct class_attribute mdev_class_attrs[] = {
> +	__ATTR_WO(mdev_start),
> +	__ATTR_WO(mdev_stop),
> +	__ATTR_NULL
> +};
> +
> +int mdev_create_sysfs_files(struct device *dev)
> +{
> +	int ret;
> +
> +	ret = sysfs_create_file(&dev->kobj,
> +				&dev_attr_mdev_supported_types.attr);
> +	if (ret) {
> +		pr_err("Failed to create mdev_supported_types sysfs entry\n");
> +		return ret;
> +	}
> +
> +	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_create.attr);
> +	if (ret) {
> +		pr_err("Failed to create mdev_create sysfs entry\n");
> +		goto create_sysfs_failed;
> +	}
> +
> +	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_destroy.attr);
> +	if (ret) {
> +		pr_err("Failed to create mdev_destroy sysfs entry\n");
> +		sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
> +	} else
> +		return ret;
> +
> +create_sysfs_failed:
> +	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_supported_types.attr);
> +	return ret;
> +}
> +
> +void mdev_remove_sysfs_files(struct device *dev)
> +{
> +	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_supported_types.attr);
> +	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
> +	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_destroy.attr);
> +}
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> new file mode 100644
> index 000000000000..0b41f301a9b7
> --- /dev/null
> +++ b/include/linux/mdev.h
> @@ -0,0 +1,236 @@
> +/*
> + * Mediated device definition
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + *     Author: Neo Jia <cjia@xxxxxxxxxx>
> + *	       Kirti Wankhede <kwankhede@xxxxxxxxxx>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef MDEV_H
> +#define MDEV_H
> +
> +#include <uapi/linux/vfio.h>
> +
> +struct parent_device;
> +
> +/*
> + * Mediated device
> + */
> +
> +struct addr_desc {
> +	unsigned long start;
> +	unsigned long size;
> +	struct list_head next;
> +};
> +
> +struct mdev_phys_mapping {
> +	struct address_space *mapping;
> +	struct list_head addr_desc_list;
> +	struct mutex addr_desc_list_lock;
> +};
> +
> +struct mdev_device {
> +	struct device		dev;
> +	struct parent_device	*parent;
> +	struct iommu_group	*group;
> +	uuid_le			uuid;
> +	uint32_t		instance;
> +	void			*driver_data;
> +
> +	/* internal only */
> +	struct kref		ref;
> +	struct list_head	next;
> +
> +	struct mdev_phys_mapping phys_mappings;
> +};
> +
> +
> +/**
> + * struct parent_ops - Structure to be registered for each parent device to
> + * register the device to mdev module.
> + *
> + * @owner:		The module owner.
> + * @dev_attr_groups:	Default attributes of the parent device.
> + * @mdev_attr_groups:	Default attributes of the mediated device.
> + * @supported_config:	Called to get information about supported types.
> + *			@dev : device structure of parent device.
> + *			@config: should return string listing supported config
> + *			Returns integer: success (0) or error (< 0)
> + * @create:		Called to allocate basic resources in parent device's
> + *			driver for a particular mediated device. It is
> + *			mandatory to provide create ops.
> + *			@mdev: mdev_device structure on of mediated device
> + *			      that is being created
> + *			@mdev_params: extra parameters required by parent
> + *			device's driver.
> + *			Returns integer: success (0) or error (< 0)
> + * @destroy:		Called to free resources in parent device's driver for a
> + *			a mediated device instance. It is mandatory to provide
> + *			destroy ops.
> + *			@mdev: mdev_device device structure which is being
> + *			       destroyed
> + *			Returns integer: success (0) or error (< 0)
> + *			If VMM is running and destroy() is called that means the
> + *			mdev is being hotunpluged. Return error if VMM is
> + *			running and driver doesn't support mediated device
> + *			hotplug.
> + * @reset:		Called to reset mediated device.
> + *			@mdev: mdev_device device structure
> + *			Returns integer: success (0) or error (< 0)
> + * @start:		Called to initiate mediated device initialization
> + *			process in parent device's driver before VMM starts.
> + *			@uuid: UUID
> + *			Returns integer: success (0) or error (< 0)
> + * @stop:		Called to teardown mediated device related resources
> + *			@uuid: UUID
> + *			Returns integer: success (0) or error (< 0)
> + * @read:		Read emulation callback
> + *			@mdev: mediated device structure
> + *			@buf: read buffer
> + *			@count: number of bytes to read
> + *			@pos: address.
> + *			Retuns number on bytes read on success or error.
> + * @write:		Write emulation callback
> + *			@mdev: mediated device structure
> + *			@buf: write buffer
> + *			@count: number of bytes to be written
> + *			@pos: address.
> + *			Retuns number on bytes written on success or error.
> + * @set_irqs:		Called to send about interrupts configuration
> + *			information that VMM sets.
> + *			@mdev: mediated device structure
> + *			@flags, index, start, count and *data : same as that of
> + *			struct vfio_irq_set of VFIO_DEVICE_SET_IRQS API.
> + * @get_region_info:	Called to get VFIO region size and flags of mediated
> + *			device.
> + *			@mdev: mediated device structure
> + *			@region_index: VFIO region index
> + *			@region_info: output, returns size and flags of
> + *				      requested region.
> + *			Returns integer: success (0) or error (< 0)
> + * @validate_map_request: Validate remap pfn request
> + *			@mdev: mediated device structure
> + *			@pos: address
> + *			@virtaddr: target user address to start at. Vendor
> + *				   driver can change if required.
> + *			@pfn: parent address of kernel memory, vendor driver
> + *			      can change if required.
> + *			@size: size of map area, vendor driver can change the
> + *			       size of map area if desired.
> + *			@prot: page protection flags for this mapping, vendor
> + *			       driver can change, if required.
> + *			Returns integer: success (0) or error (< 0)
> + *
> + * Parent device that support mediated device should be registered with mdev
> + * module with parent_ops structure.
> + */
> +
> +struct parent_ops {
> +	struct module   *owner;
> +	const struct attribute_group **dev_attr_groups;
> +	const struct attribute_group **mdev_attr_groups;
> +
> +	int	(*supported_config)(struct device *dev, char *config);
> +	int     (*create)(struct mdev_device *mdev, char *mdev_params);
> +	int     (*destroy)(struct mdev_device *mdev);
> +	int     (*reset)(struct mdev_device *mdev);
> +	int     (*start)(uuid_le uuid);
> +	int     (*stop)(uuid_le uuid);
> +	ssize_t (*read)(struct mdev_device *mdev, char *buf, size_t count,
> +			loff_t pos);
> +	ssize_t (*write)(struct mdev_device *mdev, char *buf, size_t count,
> +			 loff_t pos);
> +	int     (*set_irqs)(struct mdev_device *mdev, uint32_t flags,
> +			    unsigned int index, unsigned int start,
> +			    unsigned int count, void *data);
> +	int	(*get_region_info)(struct mdev_device *mdev, int region_index,
> +				   struct vfio_region_info *region_info);
> +	int	(*validate_map_request)(struct mdev_device *mdev, loff_t pos,
> +					u64 *virtaddr, unsigned long *pfn,
> +					unsigned long *size, pgprot_t *prot);
> +};
> +
> +/*
> + * Parent Device
> + */
> +
> +struct parent_device {
> +	struct device		*dev;
> +	const struct parent_ops	*ops;
> +
> +	/* internal */
> +	struct kref		ref;
> +	struct list_head	next;
> +	struct list_head	mdev_list;
> +	struct mutex		mdev_list_lock;
> +	wait_queue_head_t	release_done;
> +};
> +
> +/**
> + * struct mdev_driver - Mediated device driver
> + * @name: driver name
> + * @probe: called when new device created
> + * @remove: called when device removed
> + * @match: called when new device or driver is added for this bus. Return 1 if
> + *	   given device can be handled by given driver and zero otherwise.
> + * @driver: device driver structure
> + *
> + **/
> +struct mdev_driver {
> +	const char *name;
> +	int  (*probe)(struct device *dev);
> +	void (*remove)(struct device *dev);
> +	int  (*match)(struct device *dev);
> +	struct device_driver driver;
> +};
> +
> +static inline struct mdev_driver *to_mdev_driver(struct device_driver *drv)
> +{
> +	return drv ? container_of(drv, struct mdev_driver, driver) : NULL;
> +}
> +
> +static inline struct mdev_device *to_mdev_device(struct device *dev)
> +{
> +	return dev ? container_of(dev, struct mdev_device, dev) : NULL;
> +}
> +
> +static inline void *mdev_get_drvdata(struct mdev_device *mdev)
> +{
> +	return mdev->driver_data;
> +}
> +
> +static inline void mdev_set_drvdata(struct mdev_device *mdev, void *data)
> +{
> +	mdev->driver_data = data;
> +}
> +
> +extern struct bus_type mdev_bus_type;
> +
> +#define dev_is_mdev(d) ((d)->bus == &mdev_bus_type)
> +
> +extern int  mdev_register_device(struct device *dev,
> +				 const struct parent_ops *ops);
> +extern void mdev_unregister_device(struct device *dev);
> +
> +extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> +extern void mdev_unregister_driver(struct mdev_driver *drv);
> +
> +extern struct mdev_device *mdev_get_device(struct mdev_device *mdev);
> +extern void mdev_put_device(struct mdev_device *mdev);
> +
> +extern struct mdev_device *mdev_get_device_by_group(struct iommu_group *group);
> +
> +extern int mdev_device_invalidate_mapping(struct mdev_device *mdev,
> +					unsigned long addr, unsigned long size);
> +
> +extern int mdev_add_phys_mapping(struct mdev_device *mdev,
> +				 struct address_space *mapping,
> +				 unsigned long addr, unsigned long size);
> +
> +
> +extern void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr);
> +#endif /* MDEV_H */

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux