Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach

Yi Liu <yi.l.liu@xxxxxxxxx> · Thu, 18 Apr 2024 17:03:15 +0800

On 2024/4/18 08:06, Tian, Kevin wrote:
From: Alex Williamson <alex.williamson@xxxxxxxxxx>
Sent: Thursday, April 18, 2024 7:02 AM

On Wed, 17 Apr 2024 09:20:51 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

On Wed, Apr 17, 2024 at 07:16:05AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe <jgg@xxxxxxxxxx>
Sent: Wednesday, April 17, 2024 1:50 AM

On Tue, Apr 16, 2024 at 08:38:50AM +0000, Tian, Kevin wrote:
From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
Sent: Friday, April 12, 2024 4:21 PM

A userspace VMM is supposed to get the details of the device's
PASID
capability
and assemble a virtual PASID capability in a proper offset in the
virtual
PCI
configuration space. While it is still an open on how to get the
available
offsets. Devices may have hidden bits that are not in the PCI cap
chain.
For
now, there are two options to get the available offsets.[2]

- Report the available offsets via ioctl. This requires device-specific
logic
   to provide available offsets. e.g., vfio-pci variant driver. Or may the
device
   provide the available offset by DVSEC.
- Store the available offsets in a static table in userspace VMM.
VMM gets
the
   empty offsets from this table.

I'm not a fan of requesting a variant driver for every PASID-capable
VF just for the purpose of reporting a free range in the PCI config
space.

It's easier to do that quirk in userspace.

But I like Alex's original comment that at least for PF there is no
reason
to hide the offset. there could be a flag+field to communicate it. or
if there will be a new variant VF driver for other purposes e.g.
migration
it can certainly fill the field too.

Yes, since this has been such a sticking point can we get a clean
series that just enables it for PF and then come with a solution for
VF?

sure but we at least need to reach consensus on a minimal required
uapi covering both PF/VF to move forward so the user doesn't need
to touch different contracts for PF vs. VF.

Do we? The situation where the VMM needs to wholly make a up a PASID
capability seems completely new and seperate from just using an
existing PASID capability as in the PF case.

But we don't actually expose the PASID capability on the PF and as
argued in path 4/ we can't because it would break existing userspace.
> Come back to this statement.

Does 'break' means that legacy Qemu will crash due to a guest write
to the read-only PASID capability, or just a conceptually functional
break i.e. non-faithful emulation due to writes being dropped?

If the latter it's probably not a bad idea to allow exposing the PASID
capability on the PF as a sane guest shouldn't enable the PASID
capability w/o seeing vIOMMU supporting PASID. And there is no
status bit defined in the PASID capability to check back so even
if an insane guest wants to blindly enable PASID it will naturally
write and done. The only niche case is that the enable bits are
defined as RW so ideally reading back those bits should get the
latest written value. But probably this can be tolerated?

With that then should we consider exposing the PASID capability
in PCI config space as the first option? For PF it's simple as how
other caps are exposed. For VF a variant driver can also fake the
PASID capability or emulate a DVSEC capability for unused space
(to motivate the physical implementation so no variant driver is
required in the future)

If kernel exposes pasid cap for PF same as other caps, and in the meantime
the variant driver chooses to emulate a DVSEC cap, then userspace follows
the below steps to expose pasid cap to VM.

1) Check if a pasid cap is already present in the virtual config space
   read from kernel. If no, but user wants pasid, then goto step 2).
2) Userspace invokes VFIO_DEVICE_FETURE to check if the device support
   pasid cap. If yes, goto step 3).
3) Userspace gets an available offset via reading the DVSEC cap.
4) Userspace assembles a pasid cap and inserts it to the vconfig space.

For PF, step 1) is enough. For VF, it needs to go through all the 4 steps.
This is a bit different from what we planned at the beginning. But sounds
doable if we want to pursue the staging direction.

--
Regards,
Yi Liu