Re: [PATCH 2/3] VFIO driver for mediated PCI device

Kirti Wankhede <kwankhede@xxxxxxxxxx> · Sat, 25 Jun 2016 00:04:27 +0530

Thanks Alex.

On 6/22/2016 4:18 AM, Alex Williamson wrote:
> On Mon, 20 Jun 2016 22:01:47 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> 
>> +
>> +static int get_mdev_region_info(struct mdev_device *mdev,
>> +				struct pci_region_info *vfio_region_info,
>> +				int index)
>> +{
>> +	int ret = -EINVAL;
>> +	struct parent_device *parent = mdev->parent;
>> +
>> +	if (parent && dev_is_pci(parent->dev) && parent->ops->get_region_info) {
>> +		mutex_lock(&mdev->ops_lock);
>> +		ret = parent->ops->get_region_info(mdev, index,
>> +						    vfio_region_info);
>> +		mutex_unlock(&mdev->ops_lock);
> 
> Why do we have two ops_lock, one on the parent_device and one on the
> mdev_device?!  Is this one actually locking anything or also just
> providing serialization?  Why do some things get serialized at the
> parent level and some things at the device level?  Very confused by
> ops_lock.
>

There are two sets of callback:
* parent device callbacks: supported_config, create, destroy, start, stop
* mdev device callbacks: read, write, set_irqs, get_region_info,
validate_map_request

parent->ops_lock is to serialize per parent device callbacks.
mdev->ops_lock is to serialize per mdev device callbacks.

I'll add above comment in mdev.h.

>> +
>> +static int mdev_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
>> +{
>> +	/* Don't support MSIX for now */
>> +	if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
>> +		return -1;
>> +
>> +	return 1;
> 
> Too much hard coding here, the mediated driver should define this.
> 

I'm testing INTX and MSI, I don't have a way to test MSIX for now. So we
thought we can add supported for MSIX later. Till then hard code it to 1.

>> +
>> +		if (parent && parent->ops->set_irqs) {
>> +			mutex_lock(&mdev->ops_lock);
>> +			ret = parent->ops->set_irqs(mdev, hdr.flags, hdr.index,
>> +						    hdr.start, hdr.count, data);
>> +			mutex_unlock(&mdev->ops_lock);
> 
> Device level serialization on set_irqs... interesting.
> 

Hope answer above helps to clarify this.

>> +		}
>> +
>> +		kfree(ptr);
>> +		return ret;
>> +	}
>> +	}
>> +	return -ENOTTY;
>> +}
>> +
>> +ssize_t mdev_dev_config_rw(struct vfio_mdev *vmdev, char __user *buf,
>> +			   size_t count, loff_t *ppos, bool iswrite)
>> +{
>> +	struct mdev_device *mdev = vmdev->mdev;
>> +	struct parent_device *parent = mdev->parent;
>> +	int size = vmdev->vfio_region_info[VFIO_PCI_CONFIG_REGION_INDEX].size;
>> +	int ret = 0;
>> +	uint64_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +
>> +	if (pos < 0 || pos >= size ||
>> +	    pos + count > size) {
>> +		pr_err("%s pos 0x%llx out of range\n", __func__, pos);
>> +		ret = -EFAULT;
>> +		goto config_rw_exit;
>> +	}
>> +
>> +	if (iswrite) {
>> +		char *usr_data, *ptr;
>> +
>> +		ptr = usr_data = memdup_user(buf, count);
>> +		if (IS_ERR(usr_data)) {
>> +			ret = PTR_ERR(usr_data);
>> +			goto config_rw_exit;
>> +		}
>> +
>> +		ret = parent->ops->write(mdev, usr_data, count,
>> +					  EMUL_CONFIG_SPACE, pos);
> 
> No serialization on this ops, thank goodness, but why?
>

Its there at caller of mdev_dev_rw().

> This read/write interface still seems strange to me...
> 

Replied on this in 1st Patch.

>> +
>> +		memcpy((void *)(vmdev->vconfig + pos), (void *)usr_data, count);
>> +		kfree(ptr);
>> +	} else {
>> +		char *ret_data, *ptr;
>> +
>> +		ptr = ret_data = kzalloc(count, GFP_KERNEL);
>> +
>> +		if (IS_ERR(ret_data)) {
>> +			ret = PTR_ERR(ret_data);
>> +			goto config_rw_exit;
>> +		}
>> +
>> +		ret = parent->ops->read(mdev, ret_data, count,
>> +					EMUL_CONFIG_SPACE, pos);
>> +
>> +		if (ret > 0) {
>> +			if (copy_to_user(buf, ret_data, ret))
>> +				ret = -EFAULT;
>> +			else
>> +				memcpy((void *)(vmdev->vconfig + pos),
>> +					(void *)ret_data, count);
>> +		}
>> +		kfree(ptr);
> 
> So vconfig caches all of config space for the mdev, but we only ever
> use it to read the BAR address via mdev_read_base()... why?  I hope the
> mdev driver doesn't freak out if the user reads the mmio region before
> writing a base address (remember the vfio API aspect of the interface
> doesn't necessarily follow the VM PCI programming API)
> 

How could user read mmio region from guest before writing base address?
Isn't that would be a bug?
>From our driver if pos is not within the base address range, then we
return error for read/write.

>> +	}
>> +config_rw_exit:
>> +
>> +	if (ret > 0)
>> +		*ppos += ret;
>> +
>> +	return ret;
>> +}
>> +
>> +ssize_t mdev_dev_bar_rw(struct vfio_mdev *vmdev, char __user *buf,
>> +			size_t count, loff_t *ppos, bool iswrite)
>> +{
>> +	struct mdev_device *mdev = vmdev->mdev;
>> +	struct parent_device *parent = mdev->parent;
>> +	loff_t offset = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	loff_t pos;
>> +	int bar_index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	int ret = 0;
>> +
>> +	if (!vmdev->vfio_region_info[bar_index].start)
>> +		mdev_read_base(vmdev);
>> +
>> +	if (offset >= vmdev->vfio_region_info[bar_index].size) {
>> +		ret = -EINVAL;
>> +		goto bar_rw_exit;
>> +	}
>> +
>> +	count = min(count,
>> +		    (size_t)(vmdev->vfio_region_info[bar_index].size - offset));
>> +
>> +	pos = vmdev->vfio_region_info[bar_index].start + offset;
> 
> In the case of a mpci dev, @start is the vconfig BAR value, so it's
> user (guest) writable, and the mediated driver is supposed to
> understand that?  I suppose is saw the config write too, if there was
> one, but the mediated driver gives us region info based on region index.
> We have the region index here.  Why wouldn't we do reads and writes
> based on region index and offset and eliminate vconfig?  Seems like
> that would consolidate a lot of this, we don't care what we're reading
> and writing, just pass it through.  Mediated pci drivers would simply
> need to match indexes to those already defined for vfio-pci.
> 

Ok, looking at it. so this will remove vconfig completely.

Thanks,
Kirti.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html