This patch set enable the SRIOV on POWER8. This is not the final version, some patches rely on un-merged patches. The gerneral idea is put each VF in their own PE and allocated necessary resources, like DMA/IOMMU_TABLE. One thing special for VF PE is we use M64BT to cover the IOV BAR. This means we need to do some hack on pci devices's resources. 1. Expand the IOV BAR properly. 2. Shift the IOV BAR properly. 3. IOV BAR alignment is the total size instead of an individual size. 4. Take the IOV BAR alignment into consideration in the sizing and assigning. Test Environment: The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3. Examples on pass through a VF to guest through vfio: 1. install necessary modules modprobe vfio modprobe vfio-pci 2. retrieve the iommu_group the device belongs to readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group ../../../../kernel/iommu_groups/26 This means it belongs to group 26 3. see how many devices under this iommu_group ls ls /sys/kernel/iommu_groups/26/devices/ 4. unbind the original driver and bind to vfio-pci driver echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id Note: this should be done for each device in the same iommu_group 5. Start qemu and pass device through vfio /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \ -M pseries -m 2048 -enable-kvm -nographic \ -drive file=/home/ywywyang/kvm/fc19.img \ -monitor telnet:localhost:5435,server,nowait -boot cd \ -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6" Verify this is the exact VF response: 1. ping from a machine in the same subnet(the broadcast domain) 2. run arp -n on this machine 9.115.251.20 ether 00:00:c9:df:ed:bf C eth0 3. ifconfig in the guest # ifconfig eth1 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 9.115.251.20 netmask 255.255.255.0 broadcast 9.115.251.255 inet6 fe80::200:c9ff:fedf:edbf prefixlen 64 scopeid 0x20<link> ether 00:00:c9:df:ed:bf txqueuelen 1000 (Ethernet) RX packets 175 bytes 13278 (12.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 58 bytes 9276 (9.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 4. They have the same MAC address Note: make sure you shutdown other network interfaces in guest. --- v2 -> v3: 1. change the return type of virtfn_bus/virtfn_devfn to int change the name of these two functions to pci_iov_virtfn_bus/pci_iov_virtfn_devfn 2. reduce the second parameter or pcibios_sriov_disable() 3. use data instead of pe in "ppc/pnv: allocate pe->iommu_table dynamically" 4. rename __pci_sriov_resource_size to pcibios_sriov_resource_size 5. rename __pci_sriov_resource_alignment to pcibios_sriov_resource_alignment v1 -> v2: 1. change the return value of virtfn_bus/virtfn_devfn to 0 2. move some TCE related marco definition to arch/powerpc/platforms/powernv/pci.h 3. fix the __pci_sriov_resource_alignment on powernv platform During the sizing stage, the IOV BAR is truncated to 0, which will effect the order of allocation. Fix this, so that make sure BAR will be allocated ordered by their alignment. v0 -> v1: 1. Improve the change log for "PCI: Add weak __pci_sriov_resource_size() interface" "PCI: Add weak __pci_sriov_resource_alignment() interface" "PCI: take additional IOV BAR alignment in sizing and assigning" 2. Wrap VF PE code in CONFIG_PCI_IOV 3. Did regression test on P7. Wei Yang (17): pci/iov: Export interface for retrieve VF's BDF pci/of: Match PCI VFs to dev-tree nodes dynamically ppc/pci: don't unset pci resources for VFs PCI: SRIOV: add VF enable/disable hook ppc/pnv: user macro to define the TCE size ppc/pnv: allocate pe->iommu_table dynamically ppc/pnv: Add function to deconfig a PE PCI: Add weak pcibios_sriov_resource_size() interface PCI: Add weak pcibios_sriov_resource_alignment() interface PCI: take additional IOV BAR alignment in sizing and assigning ppc/pnv: Expand VF resources according to the number of total_pe powerpc/powernv: implement pcibios_sriov_resource_alignment on powernv powerpc/powernv: shift VF resource with an offset ppc/pci: create/release dev-tree node for VFs powerpc/powernv: allocate VF PE ppc/pci: Expanding IOV BAR, with m64_per_iov supported ppc/pnv: Group VF PE when IOV BAR is big on PHB3 arch/powerpc/include/asm/iommu.h | 3 + arch/powerpc/include/asm/machdep.h | 7 + arch/powerpc/include/asm/pci-bridge.h | 7 + arch/powerpc/include/asm/tce.h | 3 +- arch/powerpc/kernel/pci-common.c | 29 + arch/powerpc/platforms/powernv/Kconfig | 1 + arch/powerpc/platforms/powernv/pci-ioda.c | 824 +++++++++++++++++++++++++++-- arch/powerpc/platforms/powernv/pci.c | 22 +- arch/powerpc/platforms/powernv/pci.h | 17 +- drivers/pci/iov.c | 84 ++- drivers/pci/pci.h | 21 - drivers/pci/setup-bus.c | 66 ++- include/linux/pci.h | 46 ++ 13 files changed, 1041 insertions(+), 89 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html