On 2024/4/19 21:59, Jason Gunthorpe wrote:
On Thu, Apr 18, 2024 at 02:37:47PM -0600, Alex Williamson wrote:
Some degree of inconsistency is likely tolerated, the guest is unlikely
to check that a RW bit was set or cleared. How would we virtualize the
control registers for a VF and are they similarly virtualized for a PF
or would we allow the guest to manipulate the physical PASID control
registers?
No, the OS owns the physical PASID control. If the platform IOMMU
knows how to parse PASID then PASID support is turned on and left on
at boot time.
I think you mean host os. right?
There should be no guest visible difference to not supporting global
PASID disable, and we can't even implement it for VFs anyhow.
Same sort of argument for ATS/etc
If kernel exposes pasid cap for PF same as other caps, and in the meantime
the variant driver chooses to emulate a DVSEC cap, then userspace follows
the below steps to expose pasid cap to VM.
If we have a variant driver, why wouldn't it expose an emulated PASID
capability rather than a DVSEC if we're choosing to expose PASID for
PFs?
Indeed, also an option. Supplying the DVSEC is probably simpler and
addresses other synthesized capability blocks in future. VMM is a
better place to build various synthetic blocks in general, IMHO.
New VMM's could parse the PF PASID cap and add it to its list of "free
space"
We may also be overdoing it here..
Maybe if the VMM wants to enable PASID we should flip the logic and
the VMM should assume that unused config space is safe to use. Only
devices that violate that rule need to join an ID list and provide a
DVSEC/free space list/etc.
So, if the kernel decides to hide a specific physical capability, the
space of this capability would be considered as free to use as well.
is it?
I'm guessing that list will be pretty small and hopefully will not
grow.
any channel to collect this kind of info? :)
It is easy and better for future devices to wrap their hidden
registers in a private DVSEC.
hmmm, do you mean include the registers a DVSEC hence userspace can
work out the free space by iterating the cap chain? or still mean
indicating the free spaces by DVSEC? I guess the prior one.
Then I'd suggest just writing the special list in a text file and
leaving it in the VMM side.. Users can adjust the text file right away
if they have old and troublesome devices and all VMMs can share it.
So for the existing devices that have both pasid cap and hidden registers,
userspace should add them in the special list, and work out the free space
by referring the file. While for the devices that only have pasid cap, or
have the hidden register in a DVSEC, userspace finds a free space by
iterating the cap chain. This seems to be general for today and future.
1) Check if a pasid cap is already present in the virtual config space
read from kernel. If no, but user wants pasid, then goto step 2).
2) Userspace invokes VFIO_DEVICE_FETURE to check if the device support
pasid cap. If yes, goto step 3).
Why do we need the vfio feature interface if a physical or virtual PASID
capability on the device exposes the same info?
Still need to check if the platform, os, iommu, etc are all OK with
enabling PASID support before the viommu advertises it.
This means we don't expose physical or virtual PASID cap, is it? Otherwise,
host kernel could check if pasid is enabled before exposing the PASID cap.
--
Regards,
Yi Liu