On Mon, 2014-08-18 at 09:00 +0800, Zhang Haoyu wrote: > >> >> Hi, all > >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > >> >> so how to only unbind (part of) the VFs but PF? > >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >> >> I think I misunderstand someting, > >> >> any advises? > >> > > >> >This occurs when the PF is installed behind components in the system > >> >that do not support PCIe Access Control Services (ACS). The IOMMU group > >> >contains both the PF and the VF because upstream transactions can be > >> >re-routed downstream by these non-ACS components before being translated > >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >> >version and we might be able to give you some advise on how to work > >> >around the problem. Thanks, > >> > > >> # lspci | grep Ether > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) > >> > >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, > >> # lspci -t > >> -[0000:00]-+-00.0 > >> +-01.0-[01]-- > >> +-01.1-[02-03]--+-00.0 > >> | \-00.1 > >> +-02.0 > >> +-06.0-[04]-- > >> +-16.0 > >> +-1a.0 > >> +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- > >> | +-05.0-[08]--+-00.0 > >> | | \-00.1 > >> | +-06.0-[09]--+-00.0 > >> | | \-00.1 > >> | +-08.0-[0a]--+-00.0 > >> | | \-00.1 > >> | \-09.0-[0b]--+-00.0 > >> | \-00.1 > >> +-1d.0 > >> +-1e.0-[0c]----00.0 > >> +-1f.0 > >> +-1f.2 > >> \-1f.3 > >> > >> lspci -vvv -s 02.00.0 > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Interrupt: pin A routed to IRQ 17 > >> Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] > >> Region 2: I/O ports at e020 [size=32] > >> Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] > >> Capabilities: [40] Power Management version 3 > >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > >> Capabilities: [70] MSI-X: Enable+ Count=64 Masked- > >> Capabilities: [a0] Express (v2) Endpoint, MSI 00 > >> Capabilities: [e0] Vital Product Data > >> Capabilities: [100 v1] Advanced Error Reporting > >> Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 > >> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) > >> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) > >> Kernel driver in use: ixgbe > >> > >> # lspci -vvv -s 00:01.1 > >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 > >> I/O behind bridge: 0000e000-0000efff > >> Memory behind bridge: f7e00000-f7efffff > >> Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff > >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > >> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > >> Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port > >> Capabilities: [80] Power Management version 3 > >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > >> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > >> Capabilities: [100 v1] Virtual Channel > >> Capabilities: [140 v1] Root Complex Link > >> Capabilities: [d94 v1] #19 > >> Kernel driver in use: pcieport > >> > >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > >> does 00:01.1 PCI bridge support ACS ? > > > >It does not and that's exactly the problem. We must assume that the > >root port can redirect a transaction from a subordinate device back to > >another subordinate device without IOMMU translation when ACS support is > >not present. If you had a device plugged in below 00:01.0, we'd also > >need to assume that non-IOMMU translated peer-to-peer between devices > >behind either function, 00:01.0 or 00:01.1, is possible. > > > >Intel has indicated that processor root ports for all Xeon class > >processors should support ACS and have verified isolation for PCH based > >root ports allowing us to support quirks in place of ACS support. I'm > >not aware of any efforts at Intel to verify isolation capabilities of > >root ports on client processors. They are however aware that lack of > >ACS is a limiting factor for usability of VT-d, and I hope that we'll > >see future products with ACS support. > > > >Chances are good that the PCH root port at 00:1c.0 is supported by an > >ACS quirk, but it seems that your system has a PCIe switch below the > >root port. If the PCIe switch downstream ports support ACS, then you > >may be able to move the 82599 to the empty slot at bus 07 to separate > >the VFs into different IOMMU groups. Thanks, > > > Thanks, Alex, > how to tell whether a PCI bridge/deivce support ACS capability? > > I perform "lspci -vvv -s | grep -i ACS", nothing matched. > # lspci -vvv -s 00:1c.0 > 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Ideally there would be capabilities for it, something like: Capabilities [xxx] Access Control Services... But, Intel failed to provide this, so we enable "effective" ACS capabilities via a quirk: drivers/pci/quirks.c: /* * Many Intel PCH root ports do provide ACS-like features to disable peer * transactions and validate bus numbers in requests, but do not provide an * actual PCIe ACS capability. This is the list of device IDs known to fall * into that category as provided by Intel in Red Hat bugzilla 1037684. */ static const u16 pci_quirk_intel_pch_acs_ids[] = { /* Ibexpeak PCH */ 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, /* Cougarpoint PCH */ 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, /* Pantherpoint PCH */ 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, /* Lynxpoint-H PCH */ 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, /* Lynxpoint-LP PCH */ 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, /* Wildcat PCH */ 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, /* Patsburg (X79) PCH */ 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, }; Hopefully if you run 'lspci -n', you'll see your device ID listed among these. We don't currently have any quirks for PCIe switches, so if your IOMMU group is still bigger than it should be, that may be the reason. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html