Re: [PATCH V7 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

Yishai Hadas <yishaih@xxxxxxxxxx> · Thu, 14 Dec 2023 10:57:32 +0200

On 14/12/2023 8:07, Tian, Kevin wrote:
From: Yishai Hadas <yishaih@xxxxxxxxxx>
Sent: Wednesday, December 13, 2023 8:25 PM

On 13/12/2023 10:23, Tian, Kevin wrote:
From: Yishai Hadas <yishaih@xxxxxxxxxx>
Sent: Thursday, December 7, 2023 6:28 PM

+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count,
&register_offset)) {
+		val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 +
register_offset, copy_count))
+			return -EFAULT;
+	}
+
+	if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count,
&register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf +
copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 +
register_offset,
+				 copy_count))
+			return -EFAULT;
+	}

the write handler calls vfio_pci_core_write() for PCI_COMMAND so
the core vconfig should have the latest copy of the IO bit value which
is copied to the user buffer by vfio_pci_core_read(). then not necessary
to update it again.

You assume the the 'vconfig' mechanism/flow is always applicable for
that specific field, this should be double-checked.
However, as for now the driver doesn't rely / use the vconfig for other
fields as it doesn't match and need a big refactor, I prefer to not rely
on it at all and have it here.

iiuc this driver does relies on vconfig for other fields. It first calls
vfio_pci_core_read() and then modifies selected fields which needs
special tweak in this driver.

No, there is no dependency at all on vconfig for other fields in the driver.

vfio_pci_core_read() for most of its fields including the PCI_COMMAND 
goes directly over the PCI API/flow to the device and doesn't use the 
vconfig.

So, we must save/restore the PCI_COMMAND on the driver context to have 
it properly reported/emulated the PCI_COMMAND_IO bit.

btw what virtio-net specific tweak is applied to PCI_COMMAND? You
basically cache the cmd value and then set PCI_COMMAND_IO if
it's set in the cached value. But this is already covered by pci
vconfig. If anything is broken there then we already have a big
trouble.

As I wrote above, this field (i.e.PCI_COMMAND) is not managed at all by 
vconfig, so we need to emulate it in the driver.

The trick for bar0 makes sense as it doesn't exist physically then
the vconfig mechanism may not give the expected result. But
I didn't see the same rationale for PCI_COMMAND.

+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
*buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND,
sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd +
register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
+					  sizeof(virtvdev->pci_base_addr_0),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev-
pci_base_addr_0 + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}

wrap above into virtiovf_pci_write_config() to be symmetric with
the read path.

Upon the read path, we do the full flow and return to the user. Here we
just save some data and continue to call the core, so I'm not sure that
this worth a dedicated function.

I don't see a real difference.

for the read path you first call vfio_pci_core_read() then modifies some
fields.

for the write path you save some data then call vfio_pci_core_write().

However, this can be done, do you still suggest it for V8 ?

yes

OK, will add as part of V8.

+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev-
device);

which code is relevant to MSIX?

The buffer size must include the MSIX part to match the virtio-net
specification.

As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer
upon read/write.

at least mention that MSIX is in the device config region otherwise
it's not helpful to people w/o virtio background.

MSIX is not in the device configuration region, it follows the common one.
In any case, it looks like this comment doesn't give any real value, 
I'll simply drop it in V8.

+
+static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
+	.name = "virtio-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,

could be vfio_pci_core_open_device(). Given virtiovf specific init func
is not called  virtiovf_pci_open_device() is essentially same as the
core func.

We don't have today vfio_pci_core_open_device() as an exported function.

The virtiovf_pci_open_device() matches both cases so I don't see a real
reason to export it now.

By the way, it follows other drivers in vfio, see for example here [1].

[1]
https://elixir.bootlin.com/linux/v6.7-
rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383

ah, yes.

+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
+	    !virtiovf_bar0_exists(pdev))
+		ops = &virtiovf_vfio_pci_tran_ops;

I have a confusion here.

why do we want to allow this driver binding to non-matching VF or
even PF?

The intention is to allow the binding of any virtio-net device (i.e. PF,
VF which is not transitional capable) to have a single driver over VFIO
for all virtio-net devices.

This enables any user space application to bind and use any virtio-net
device without the need to care.

In case the device is not transitional capable, it will simply use the
generic vfio functionality.

Is it useful to print a message here?

I don't think so, we usually don't print such messages.
User space will work based on caps to know what is supported, as of 
other cases.

if that is the intention then the naming/description should be adjusted
to not specific to vf throughout this patch.

e.g. don't use "virtiovf_" prefix...

The main functionality is to supply the transitional device to user
space for the VF, hence the prefix and the description for that driver
refers to VF.

Let's stay with that.

ok

the config option is generic:

+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO NET PCI devices"

but the description is specific to vf:

+          This provides support for exposing VIRTIO NET VF devices which
support
+          legacy IO access, using the VFIO framework that can work with a
legacy
+          virtio driver in the guest.

then the module description is generic again:

+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");

Yes, as the binding allows that, it looks fine to me.

what about revising the kconfig message to make it clear that it's
for all virtio-net device with special trick to make VF as a
transitional device?

Kconfig doesn't need to get into the details of how things were done.
We already mention that it emulates I/O BAR, this seems to be good enough.

However, I'll improve it as you already suggested in the previous mail.

Yishai