Re: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/12/2023 10:23, Tian, Kevin wrote:
From: Yishai Hadas <yishaih@xxxxxxxxxx>
Sent: Thursday, December 7, 2023 6:28 PM

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

this is still captured by the new driver. Just the difference between
using admin cmds vs. directly accessing bar when emulating the access.

OK, I can rephrase the above to clarify that.


+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO NET VF devices which
support
+          legacy IO access, using the VFIO framework that can work with a
legacy
+          virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.

"thus, ..." duplicates with the former part.

+          As of that this driver emulated I/O BAR in software to let a VF be

s/emulated/emulates/

OK

+          seen as a transitional device in the guest and let it work with
+          a legacy driver.

VFIO is not specific to the guest. a native application including a legacy
virtio driver could also benefit. let's not write it in a way specific to virt.


OK, we can rephrase to the below.
" .. to let a VF be seen as a transitional device by its users and .."


+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)

this name only talks about the behavior for VIRTIO_PCI_QUEUE_NOTIFY.

for legacy admin cmd it's unclear whether it's actually conveyed to a
mem bar.

is it clearer to call it virtiovf_pci_bar0_rw()?

I'm fine with your rename suggestion, will be part of V8.


+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count,
&register_offset)) {
+		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 +
register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count,
&register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf +
copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 +
register_offset,
+				 copy_count))
+			return -EFAULT;
+	}

the write handler calls vfio_pci_core_write() for PCI_COMMAND so
the core vconfig should have the latest copy of the IO bit value which
is copied to the user buffer by vfio_pci_core_read(). then not necessary
to update it again.

You assume the the 'vconfig' mechanism/flow is always applicable for that specific field, this should be double-checked. However, as for now the driver doesn't rely / use the vconfig for other fields as it doesn't match and need a big refactor, I prefer to not rely on it at all and have it here.


btw the approach in this patch sounds a bit hackish - it modifies the
result before/after vfio pci core emulation instead of directly injecting
its specific emulation logic in vfio vconfig. It's probably being that
vfio vconfig currently has a global permission/handler scheme for
all pci devices. Extending it to support per-device tweak might need
lots of change.

Right, the vconfig is not ready for that and might require lots of change, for now all is done by the driver layer.


So I'm not advocating that big change at this point, especially when
only this driver imposes such requirement now. But in the future when
more drivers e.g. Ankit's nvgrace-gpu want to do similar tweak we
may consider such possibility.

This can be some orthogonal future refactoring once we'll have it.


+
+	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
sizeof(val32),
+				  &copy_offset, &copy_count,
&register_offset)) {
+		u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
+		u32 pci_base_addr_0 = le32_to_cpu(virtvdev-
pci_base_addr_0);
+
+		val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) |
PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val32 +
register_offset, copy_count))
+			return -EFAULT;
+	}

Do we care about the initial value of bar0? this patch leaves it as 0,
unlike other real bars initialized with the hw value. In reality this
may not be a problem as software usually writes all 1's to detect
the size as the first step.


raise it just in case others may see a potential issue.

We don't see here an issue as you mentioned above.


+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
*buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND,
sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd +
register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
+					  sizeof(virtvdev->pci_base_addr_0),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev-
pci_base_addr_0 + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}

wrap above into virtiovf_pci_write_config() to be symmetric with
the read path.

Upon the read path, we do the full flow and return to the user. Here we just save some data and continue to call the core, so I'm not sure that this worth a dedicated function.

However, this can be done, do you still suggest it for V8 ?


+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev-
device);

which code is relevant to MSIX?

The buffer size must include the MSIX part to match the virtio-net specification.

As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer upon read/write.



+
+static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
+	.name = "virtio-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,

could be vfio_pci_core_open_device(). Given virtiovf specific init func
is not called  virtiovf_pci_open_device() is essentially same as the
core func.

We don't have today vfio_pci_core_open_device() as an exported function.

The virtiovf_pci_open_device() matches both cases so I don't see a real reason to export it now.

By the way, it follows other drivers in vfio, see for example here [1].

[1] https://elixir.bootlin.com/linux/v6.7-rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383



+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
+	    !virtiovf_bar0_exists(pdev))
+		ops = &virtiovf_vfio_pci_tran_ops;

I have a confusion here.

why do we want to allow this driver binding to non-matching VF or
even PF?

The intention is to allow the binding of any virtio-net device (i.e. PF, VF which is not transitional capable) to have a single driver over VFIO for all virtio-net devices.

This enables any user space application to bind and use any virtio-net device without the need to care.

In case the device is not transitional capable, it will simply use the generic vfio functionality.


if that is the intention then the naming/description should be adjusted
to not specific to vf throughout this patch.

e.g. don't use "virtiovf_" prefix...

The main functionality is to supply the transitional device to user space for the VF, hence the prefix and the description for that driver refers to VF.

Let's stay with that.


the config option is generic:

+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"

but the description is specific to vf:

+          This provides support for exposing VIRTIO NET VF devices which support
+          legacy IO access, using the VFIO framework that can work with a legacy
+          virtio driver in the guest.

then the module description is generic again:

+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");


Yes, as the binding allows that, it looks fine to me.

Thanks,
Yishai




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux