> From: kvm-owner@xxxxxxxxxxxxxxx <kvm-owner@xxxxxxxxxxxxxxx> On Behalf > Of Niklas Schnelle > Sent: Friday, August 28, 2020 5:10 PM > To: Bjorn Helgaas <helgaas@xxxxxxxxxx>; Alex Williamson > <alex.williamson@xxxxxxxxxx> > [...] > >> > >> FWIW, pci_physfn() never returns NULL, it returns the provided pdev if > >> is_virtfn is not set. This proposal wouldn't change that return value. > >> AIUI pci_physfn(), the caller needs to test that the returned device is > >> different from the provided device if there's really code that wants to > >> traverse to the PF. > > > > Oh, so this VF has is_virtfn==0. That seems weird. There are lots of > > other ways that a VF is different: Vendor/Device IDs are 0xffff, BARs > > are zeroes, etc. > > > > It sounds like you're sweeping those under the rug by avoiding the > > normal enumeration path (e.g., you don't have to size the BARs), but > > if it actually is a VF, it seems like there might be fewer surprises > > if we treat it as one. > > > > Why don't you just set is_virtfn=1 since it *is* a VF, and then deal > > with the special cases where you want to touch the PF? > > > > Bjorn > > > > As we are always running under at least a machine level hypervisor > we're somewhat in the same situation as e.g. a KVM guest in > that the VFs we see have some emulation that makes them act more like > normal PCI functions. It just so happens that the machine level hypervisor > does not emulate the PCI_COMMAND_MEMORY, it does emulate BARs and > Vendor/Device IDs > though. > So is_virtfn is 0 for some VF for the same reason it is 0 on > KVM/ESXi/HyperV/Jailhouse… > guests on other architectures. I wonder whether it's a good idea to also find a way to set is_virtfn for normal KVM guest which get a vf assigned. There are other cases where faithful emulation of certain PCI capabilities is difficult, e.g. when enabling guest SVA related features (PASID/ATS/PRS). Per PCIe spec, some or all fields of those capabilities are shared between PF and VF. Among them: 1) Some could be emulated properly and indirectly reflected in hardware, e.g. Intel VT-d allows additional control per VF about whether to accept page request, execute/privileged permission, etc. thus allowing VF-specific control even when device-side setting is shared; 2) Some could be purely emulated in software and it's harmless to leave the hardware following PF setting, e.g. ATS enable, STU(?), outstanding page request allocation, etc.; 3) However, I didn’t see a clean way of emulating page_request_ctrl.reset and page_request_status.stopped. Those two have clear definition about outstanding page requests. They are shared thus we cannot issue physical action just due to request on one VF, while pure software emulation cannot guarantee the desired expectation. Of course this issue also exists even on bare metal - pci_enable/disable/reset_pri just do nothing for vf. But there is chance to mitigate (e.g. timeout), but not possible in guest if the guest doesn't know it's actually a VF. Setting is_virtfn=1 allows guest to be cooperative like running together with PF driver. But there is an ordering issue. The guest knows whether a device is VF only when the VF driver is loaded (based on PCI_ID), but related capabilities might be already enabled when attaching the device to IOMMU (at least for intel_iommu). But suppose it's not a hard fix. Last, detached vf is not a PCISIG definition. So the host still needs to do proper emulation (even not faithful) of those capabilities for guests who don't recognize detached vf. Thoughts? Thanks Kevin