Hi Shameer, On 11/12/19 6:56 PM, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -----Original Message----- >> From: Shameerali Kolothum Thodi >> Sent: 12 November 2019 14:21 >> To: 'Auger Eric' <eric.auger@xxxxxxxxxx>; eric.auger.pro@xxxxxxxxx; >> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >> kvm@xxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx; joro@xxxxxxxxxx; >> alex.williamson@xxxxxxxxxx; jacob.jun.pan@xxxxxxxxxxxxxxx; >> yi.l.liu@xxxxxxxxx; jean-philippe.brucker@xxxxxxx; will.deacon@xxxxxxx; >> robin.murphy@xxxxxxx >> Cc: kevin.tian@xxxxxxxxx; vincent.stehle@xxxxxxx; ashok.raj@xxxxxxxxx; >> marc.zyngier@xxxxxxx; tina.zhang@xxxxxxxxx; Linuxarm >> <linuxarm@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx> >> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part) >> > [...] >>>>>> I am trying to get this running on one of our platform that has smmuv3 >> dual >>>>>> stage support. I am seeing some issues with this when an ixgbe vf dev is >>>>>> made pass-through and is behind a vSMMUv3 in Guest. >>>>>> >>>>>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 >>>>>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5 >>>>>> >>>>>> And this is my Qemu cmd line, >>>>>> >>>>>> ./qemu-system-aarch64 >>>>>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host >> \ >>>>>> -kernel Image \ >>>>>> -drive if=none,file=ubuntu,id=fs \ >>>>>> -device virtio-blk-device,drive=fs \ >>>>>> -device vfio-pci,host=0000:01:10.1 \ >>>>>> -bios QEMU_EFI.fd \ >>>>>> -net none \ >>>>>> -m 4G \ >>>>>> -nographic -D -d -enable-kvm \ >>>>>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force" >>>>>> >>>>>> The basic ping from Guest works fine, >>>>>> root@ubuntu:~# ping 10.202.225.185 >>>>>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data. >>>>>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms >>>>>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms >>>>>> ... >>>>>> >>>>>> But if I increase ping packet size, >>>>>> >>>>>> root@ubuntu:~# ping -s 1024 10.202.225.185 >>>>>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data. >>>>>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms >>>>>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms >>>>>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable >>>>>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable >>>>>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable >>>>>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable >>>>>> >>>>>> And from Host kernel I get, >>>>>> [ 819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets >> detected >>>>>> [ 824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets >> detected >>>>>> [ 828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets >> detected >>>>>> [ 830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets >> detected >>>>>> [ 832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets >> detected >>>>>> [ 834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets >> detected >>>>>> >>>>>> Also noted that iperf cannot work as it fails to establish the connection >>> with >>>>> iperf >>>>>> server. >>>>>> >>>>>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your >>>>> reference. >>>>>> I haven't debugged this further yet and thought of checking with you if >> this >>> is >>>>>> something you have seen already or not. Or maybe I am missing >> something >>>>> here? >>>>> >>>>> Please can you try to edit and modify hw/vfio/common.c, function >>>>> vfio_iommu_unmap_notify >>>>> >>>>> >>>>> /* >>>>> if (size <= 0x10000) { >>>>> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; >>>>> ustruct.info.granularity = IOMMU_INV_GRANU_ADDR; >>>>> ustruct.info.addr_info.flags = >>> IOMMU_INV_ADDR_FLAGS_ARCHID; >>>>> if (iotlb->leaf) { >>>>> ustruct.info.addr_info.flags |= >>>>> IOMMU_INV_ADDR_FLAGS_LEAF; >>>>> } >>>>> ustruct.info.addr_info.archid = iotlb->arch_id; >>>>> ustruct.info.addr_info.addr = start; >>>>> ustruct.info.addr_info.granule_size = size; >>>>> ustruct.info.addr_info.nb_granules = 1; >>>>> trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1, >>>>> iotlb->leaf); >>>>> } else { >>>>> */ >>>>> ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB; >>>>> ustruct.info.granularity = IOMMU_INV_GRANU_PASID; >>>>> ustruct.info.pasid_info.archid = iotlb->arch_id; >>>>> ustruct.info.pasid_info.flags = >>> IOMMU_INV_PASID_FLAGS_ARCHID; >>>>> trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id); >>>>> // } >>>>> >>>>> This modification leads to invalidate the whole asid each time we get a >>>>> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I >>>>> saw this was the cause of such kind of issues. Please let me know if it >>>>> fixes your perf issues >>>> >>>> Yes, this seems to fix the issue. >>>> >>>> root@ubuntu:~# iperf -c 10.202.225.185 >>>> ------------------------------------------------------------ >>>> Client connecting to 10.202.225.185, TCP port 5001 >>>> TCP window size: 85.0 KByte (default) >>>> ------------------------------------------------------------ >>>> [ 3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port >>> 5001 >>>> [ ID] Interval Transfer Bandwidth >>>> [ 3] 0.0-10.0 sec 2.27 GBytes 1.95 Gbits/sec >>>> root@ubuntu:~# >>>> >>>> But the performance seems to be very poor as this is a 10Gbps interface(Of >>> course >>>> invalidating the whole asid may not be very helpful). It is interesting that >> why >>> the >>>> single iova invalidation is not working. >>>> >>>> and then we may discuss further about the test >>>>> configuration. >>>> >>>> Sure. Please let me know. >>> >>> I reported that issue earlier on the ML. I have not been able to find >>> any integration issue in the kernel/qemu code but maybe I am too blind >>> now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down >>> to the physical IOMMU. I also pass the LEAF flag. >> >> Ok. >> >>> As you are an expert of the SMMUv3 PMU, if your implementation has any >>> and you have cycles to look at this, it would be helpful to run it and >>> see if something weird gets highlighted. >> >> :). Sure. I will give it a try and report back if anything suspicious. > > I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which seems > to be the cause for single IOVA TLBI not working properly. > > I had this fix in arm-smmuv3.c, > > @@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); > break; > case CMDQ_OP_TLBI_NH_VA: > + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); Damn, I did not see that! That's it. ASID invalidation fills this field indeed. You may post an independent patch for that.> cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); > cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); > cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; > > > With this, your original qemu branch is working. > > root@ubuntu:~# iperf -c 10.202.225.185 > ------------------------------------------------------------ > Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default) > ------------------------------------------------------------ > [ 3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 3.21 GBytes 2.76 Gbits/sec > > Could you please check this... > > I also have a rebase of your patches on top of 5.4-rc5. This has some optimizations > From Will such as batched TLBI inv. Please find it here, > > https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5 > > This gives me a better performance with iperf, > > root@ubuntu:~# iperf -c 10.202.225.185 > ------------------------------------------------------------ > Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default) > ------------------------------------------------------------ > [ 3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 4.91 GBytes 4.22 Gbits/sec root@ubuntu:~# > > If possible please check this branch as well. To be honest I don't really know what to do with this work. Despite the efforts, this has suffered from a lack of traction in the community. My last attempt to explain the use cases, upon Will's request at Plumber, has not received any comment (https://lkml.org/lkml/2019/9/20/104). I think I will post a rebased version with your fix, as a matter to get a clean snapshot. If you think this work is useful for your projects, please let it know on the ML. Thank you again! Eric > > Thanks, > Shameer > >> Thanks, >> Shameer >> >> >>> Thanks >>> >>> Eric >>>> >>>> Cheers, >>>> Shameer >>>> >>>>> Thanks >>>>> >>>>> Eric >>>>> >>>>> >>>>> >>>>>> >>>>>> Please let me know. >>>>>> >>>>>> Thanks, >>>>>> Shameer >>>>>> >>>>>>> Best Regards >>>>>>> >>>>>>> Eric >>>>>>> >>>>>>> This series can be found at: >>>>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 >>>>>>> >>>>>>> It series includes Tina's patch steming from >>>>>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device >>>>>>> specific irq" plus patches originally contributed by Yi. >>>>>>> >>>>>>> History: >>>>>>> >>>>>>> v8 -> v9: >>>>>>> - introduce specific irq framework >>>>>>> - single fault region >>>>>>> - iommu_unregister_device_fault_handler failure case not handled >>>>>>> yet. >>>>>>> >>>>>>> v7 -> v8: >>>>>>> - rebase on top of v5.2-rc1 and especially >>>>>>> 8be39a1a04c1 iommu/arm-smmu-v3: Add a master->domain >> pointer >>>>>>> - dynamic alloc of s1_cfg/s2_cfg >>>>>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync >>>>>>> - check there is no HW MSI regions >>>>>>> - asid invalidation using pasid extended struct (change in the uapi) >>>>>>> - add s1_live/s2_live checks >>>>>>> - move check about support of nested stages in domain finalise >>>>>>> - fixes in error reporting according to the discussion with Robin >>>>>>> - reordered the patches to have first iommu/smmuv3 patches and then >>>>>>> VFIO patches >>>>>>> >>>>>>> v6 -> v7: >>>>>>> - removed device handle from bind/unbind_guest_msi >>>>>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain >>>>>>> enforcement" >>>>>>> - added few uapi comments as suggested by Jean, Jacop and Alex >>>>>>> >>>>>>> v5 -> v6: >>>>>>> - Fix compilation issue when CONFIG_IOMMU_API is unset >>>>>>> >>>>>>> v4 -> v5: >>>>>>> - fix bug reported by Vincent: fault handler unregistration now happens >> in >>>>>>> vfio_pci_release >>>>>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small >>>>>>> uapi changes suggested by Kean-Philippe (except fetch_addr) >>>>>>> - iommu: introduce device fault report API: removed the PRI part. >>>>>>> - see individual logs for more details >>>>>>> - reset the ste abort flag on detach >>>>>>> >>>>>>> v3 -> v4: >>>>>>> - took into account Alex, jean-Philippe and Robin's comments on v3 >>>>>>> - rework of the smmuv3 driver integration >>>>>>> - add tear down ops for msi binding and PASID table binding >>>>>>> - fix S1 fault propagation >>>>>>> - put fault reporting patches at the beginning of the series following >>>>>>> Jean-Philippe's request >>>>>>> - update of the cache invalidate and fault API uapis >>>>>>> - VFIO fault reporting rework with 2 separate regions and one >> mmappable >>>>>>> segment for the fault queue >>>>>>> - moved to PATCH >>>>>>> >>>>>>> v2 -> v3: >>>>>>> - When registering the S1 MSI binding we now store the device handle. >>> This >>>>>>> addresses Robin's comment about discimination of devices beonging >>> to >>>>>>> different S1 groups and using different physical MSI doorbells. >>>>>>> - Change the fault reporting API: use >> VFIO_PCI_DMA_FAULT_IRQ_INDEX >>> to >>>>>>> set the eventfd and expose the faults through an mmappable fault >>> region >>>>>>> >>>>>>> v1 -> v2: >>>>>>> - Added the fault reporting capability >>>>>>> - asid properly passed on invalidation (fix assignment of multiple >>>>>>> devices) >>>>>>> - see individual change logs for more info >>>>>>> >>>>>>> >>>>>>> Eric Auger (8): >>>>>>> vfio: VFIO_IOMMU_SET_MSI_BINDING >>>>>>> vfio/pci: Add VFIO_REGION_TYPE_NESTED region type >>>>>>> vfio/pci: Register an iommu fault handler >>>>>>> vfio/pci: Allow to mmap the fault queue >>>>>>> vfio: Add new IRQ for DMA fault reporting >>>>>>> vfio/pci: Add framework for custom interrupt indices >>>>>>> vfio/pci: Register and allow DMA FAULT IRQ signaling >>>>>>> vfio: Document nested stage control >>>>>>> >>>>>>> Liu, Yi L (2): >>>>>>> vfio: VFIO_IOMMU_SET_PASID_TABLE >>>>>>> vfio: VFIO_IOMMU_CACHE_INVALIDATE >>>>>>> >>>>>>> Tina Zhang (1): >>>>>>> vfio: Use capability chains to handle device specific irq >>>>>>> >>>>>>> Documentation/vfio.txt | 77 ++++++++ >>>>>>> drivers/vfio/pci/vfio_pci.c | 283 >>>>> ++++++++++++++++++++++++++-- >>>>>>> drivers/vfio/pci/vfio_pci_intrs.c | 62 ++++++ >>>>>>> drivers/vfio/pci/vfio_pci_private.h | 24 +++ >>>>>>> drivers/vfio/pci/vfio_pci_rdwr.c | 45 +++++ >>>>>>> drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++ >>>>>>> include/uapi/linux/vfio.h | 109 ++++++++++- >>>>>>> 7 files changed, 747 insertions(+), 19 deletions(-) >>>>>>> >>>>>>> -- >>>>>>> 2.20.1 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> kvmarm mailing list >>>>>>> kvmarm@xxxxxxxxxxxxxxxxxxxxx >>>>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >>>> >