Re: [PATCH v2 4/4] vfio: Report PASID capability via VFIO_DEVICE_FEATURE ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/4/18 08:21, Tian, Kevin wrote:
From: Alex Williamson <alex.williamson@xxxxxxxxxx>
Sent: Thursday, April 18, 2024 4:26 AM

On Wed, 17 Apr 2024 07:09:52 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

From: Alex Williamson <alex.williamson@xxxxxxxxxx>
Sent: Wednesday, April 17, 2024 1:57 AM

On Fri, 12 Apr 2024 01:21:21 -0700
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:

+ */
+struct vfio_device_feature_pasid {
+	__u16 capabilities;
+#define VFIO_DEVICE_PASID_CAP_EXEC	(1 << 0)
+#define VFIO_DEVICE_PASID_CAP_PRIV	(1 << 1)
+	__u8 width;
+	__u8 __reserved;
+};

Building on Kevin's comment on the cover letter, if we could describe
an offset for emulating a PASID capability, this seems like the place
we'd do it.  I think we're not doing that because we'd like an in-band
mechanism for a device to report unused config space, such as a DVSEC
capability, so that it can be implemented on a physical device.  As
noted in the commit log here, we'd also prefer not to bloat the kernel
with more device quirks.

In an ideal world we might be able to jump start support of that DVSEC
option by emulating the DVSEC capability on top of the PASID capability
for PFs, but unfortunately the PASID capability is 8 bytes while the
DVSEC capability is at least 12 bytes, so we can't implement that
generically either.

Yeah, that's a problem.


I don't know there's any good solution here or whether there's actually
any value to the PASID capability on a PF, but do we need to consider
leaving a field+flag here to describe the offset for that scenario?

Yes, I prefer to this way.

Would we then allow variant drivers to take advantage of it?  Does this
then turn into the quirk that we're trying to avoid in the kernel
rather than userspace and is that a problem?  Thanks,


We don't want to proactively pursue quirks in the kernel.

But if a variant driver exists for other reasons, I don't see why it
should be prohibited from deciding an offset to ease the
userspace. 😊

At that point we've turned the corner into an arbitrary policy decision
that I can't defend.  A "worthy" variant driver can implement something
through a side channel vfio API, but implementing that side channel
itself is not enough to justify a variant driver?  It doesn't make
sense.

Further, if we have a variant driver, why do we need a side channel for
the purpose of describing available config space when we expect devices
themselves to eventually describe the same through a DVSEC capability?
The purpose of enabling variant drivers is to enhance the functionality
of the device.  Adding an emulated DVSEC capability seems like a valid
enhancement to justify a variant driver to me.

So the more I think about it, it would be easy to add something here
that hints a location for an emulated PASID capability in the VMM, but
it would also be counterproductive to an end goal of having a DVSEC
capability that describes unused config space.  The very narrow scope
where that side-band channel would be useful is an unknown PF device
which doesn't implement a DVSEC capability and without intervention
simply behaves as it always has, without PASID support.

A vendor desiring such support can a) implement DVSEC in the hardware,
b) implement a variant driver emulating a DVSEC capability, or c)
directly modify the VMM to tell it where to place the PASID capability.
I also don't think we should exclude the possibility that b) could turn
into a shared variant driver that knows about multiple devices and has
a table of free config space for each.  Option c) is only the last
resort if there's not already 12 bytes of contiguous, aligned free
space to place a DVSEC capability.  That seems unlikely.

or b) could be a table in vfio_pci_config.c i.e. kind of making vfio-pci
as the shared variant driver.


At some point we need to define the format and use of this DVSEC.  Do
we allow (not require) one at every gap in config space that's at least
12-bytes long and adjust the DVSEC Length to describe longer gaps, or do

Does PCI spec allows multiple same-type capabilities co-existing?

For DVSEC, it allows. Below is the sentence from PCIe spec.

A single PCI Express Function or RCRB is permitted to contain multiple DVSEC Capability structures


we use a single DVSEC to describe a table of ranges throughout extended
(maybe even conventional) config space?  The former seems easier,

this might be challenging as the table itself requires a contiguous
large free block.

especially if we expect a device has a large block of free space,
enough for multiple emulated capabilities and described by a single
DVSEC.  Thanks,


this is a good point. The ATS and PRI capability do not exist in VF as
well. They need to be emulated.


yes that sounds simpler.

--
Regards,
Yi Liu




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux