On Fri, 9 Feb 2018 13:25:31 -0800 Ravi Kerur <rkerur@xxxxxxxxx> wrote: > CC'ing Anatoly(DPDK/vIOMMU engineer). More information inline. Thanks > Alex for your prompt response. > > > On Fri, Feb 9, 2018 at 12:02 PM, Alex Williamson > <alex.williamson@xxxxxxxxxx> wrote: > > On Fri, 9 Feb 2018 10:31:57 -0800 > > Ravi Kerur <rkerur@xxxxxxxxx> wrote: > > > >> Hi, > >> > >> I am running into an issue when DPDK is started with intel_iommu=on > >> via GRUB command. Problem is not seen with regular kernel driver, > >> error messages show when DPDK is started and happens for both PF and > >> VF interfaces. Discussing with DPDK folks it is not a DPDK issue hence > >> sending an email to this list. > >> > >> Workaround is to use "iommu=pt" but I want iommu enabled in my setup. > >> > >> My understanding is that 'Address width' is reported as '46' on host > >> and '39' on guest and it could be causing the problem but not certain > >> so kindly let me know how to resolve this issue. > >> > >> I tried influencing guest address width with 'host-phys-bits=true and > >> phys-bits=46' properties when instantiating a guest but it still ends > >> with address width 39. > >> > >> I have tried 'x-aw-bits=48', still see DMAR errors. > >> > >> Following are the details > >> > >> (1) Linux kernel 4.9 (host and guest), Qemu 2.11 > >> > >> (2) DPDK 17.05 > >> > >> (3) IXGBE details > >> ethtool -i enp4s0f0 (PF driver) > >> driver: ixgbe > >> version: 5.3.3 > >> firmware-version: 0x800007b8, 1.1018.0 > >> bus-info: 0000:04:00.0 > >> supports-statistics: yes > >> supports-test: yes > >> supports-eeprom-access: yes > >> supports-register-dump: yes > >> supports-priv-flags: yes > >> > >> ethtool -i enp4s16f2 (VF driver) > >> driver: ixgbevf > >> version: 4.3.2 > >> firmware-version: > >> bus-info: 0000:04:10.2 > >> supports-statistics: yes > >> supports-test: yes > >> supports-eeprom-access: no > >> supports-register-dump: yes > >> supports-priv-flags: no > >> > >> Bus info Device Class Description > >> ========================================================= > >> pci@0000:01:00.0 ens11f0 network 82599ES 10-Gigabit > >> SFI/SFP+ Network Connection > >> pci@0000:01:00.1 ens11f1 network 82599ES 10-Gigabit > >> SFI/SFP+ Network Connection > >> pci@0000:04:00.0 enp4s0f0 network 82599ES 10-Gigabit > >> SFI/SFP+ Network Connection > >> pci@0000:04:00.1 enp4s0f1 network 82599ES 10-Gigabit > >> SFI/SFP+ Network Connection > >> pci@0000:04:10.0 enp4s16 network Illegal Vendor ID > >> pci@0000:04:10.2 enp4s16f2 network Illegal Vendor ID > >> > >> (5) Kernel dmesg on host > >> > >> dmesg | grep -e DMAR -e IOMMU > >> [ 0.000000] ACPI: DMAR 0x000000007999BAD0 0000E0 (v01 ALASKA A M I > >> 00000001 INTL 20091013) > >> [ 0.000000] DMAR: IOMMU enabled > >> [ 0.519368] DMAR: Host address width 46 > >> [ 0.527243] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0 > >> [ 0.538073] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap > >> d2078c106f0466 ecap f020df > >> [ 0.554253] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1 > >> [ 0.565074] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap > >> d2078c106f0466 ecap f020df > >> [ 0.581259] DMAR: RMRR base: 0x0000007bbc6000 end: 0x0000007bbd4fff > >> [ 0.593980] DMAR: ATSR flags: 0x0 > >> [ 0.600802] DMAR: RHSA base: 0x000000c7ffc000 proximity domain: 0x0 > >> [ 0.613532] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x1 > >> [ 0.626265] DMAR-IR: IOAPIC id 3 under DRHD base 0xfbffc000 IOMMU 0 > >> [ 0.639177] DMAR-IR: IOAPIC id 1 under DRHD base 0xc7ffc000 IOMMU 1 > >> [ 0.652089] DMAR-IR: IOAPIC id 2 under DRHD base 0xc7ffc000 IOMMU 1 > >> [ 0.664996] DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000 > >> [ 0.675984] DMAR-IR: Queued invalidation will be enabled to support > >> x2apic and Intr-remapping. > >> [ 0.694475] DMAR-IR: Enabled IRQ remapping in x2apic mode > >> [ 9.637093] DMAR: dmar1: Using Queued invalidation > >> [ 9.646945] DMAR: Setting RMRR: > >> [ 9.653942] DMAR: Setting identity map for device 0000:00:1d.0 > >> [0x7bbc6000 - 0x7bbd4fff] > >> [ 9.670513] DMAR: Prepare 0-16MiB unity mapping for LPC > >> [ 9.681656] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 > >> - 0xffffff] > >> [ 9.696630] DMAR: Intel(R) Virtualization Technology for Directed I/O > >> [ 2605.450811] DMAR: DRHD: handling fault status reg 2 > >> [ 2605.450814] DMAR: [DMA Read] Request device [04:10.0] fault addr > >> 33a128000 [fault reason 06] PTE Read access is not set > >> [ 2607.450907] DMAR: DRHD: handling fault status reg 102 > >> [ 2607.450910] DMAR: [DMA Read] Request device [04:10.0] fault addr > >> 33a128000 [fault reason 06] PTE Read access is not set > >> [ 4539.597735] DMAR: DRHD: handling fault status reg 202 > >> [ 4539.597737] DMAR: [DMA Read] Request device [04:10.0] fault addr > >> 33a128000 [fault reason 06] PTE Read access is not set > > > > It's a read fault to address 0x33a128000, just below 13G. I don't see > > why that would have anything to do with the emulated VT-d address > > width, we're well under 39bits. Is the VM configured with enough > > memory that this would be a valid guest physical address? Is the VM > > configured with more than 39bits of physical address space (1TB)? > > > > Guest was instantiated with the following command. I have tried both > 16G and 64G issue is seen. I have tried with/without hugepages (no > -mem-path -mem-prealloc mlock) same issue is seen. So 0x33a128000 is likely a valid guest physical address or IOVA to be programmed through the IOMMU, it's simply not mapped, but is it not mapped because of a DPDK bug or a VT-d emulation bug... I don't know, but I have no reason to exclude either. > /rk-qemu-2.11/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M > q35,accel=kvm,kernel-irqchip=split -object iothread,id=iothread0 > -device intel-iommu,intremap=on,device-iotlb=on,caching-mode=on -cpu > host -daemonize -m 16G -mem-prealloc -mem-path /dev/hugepages_1GB > -realtime mlock=on -smp 14 -uuid 0fc91c66-f0b1-11e7-acf4-525400123456 > -name '212748-sriov-ravi-smac-alpha-SMAC10' -device > ioh3420,id=root.1,chassis=1 -device ioh3420,id=root.2,chassis=2 > -netdev tap,vhost=on,queues=2,ifname=vn-vn2_1_,downscript=no,id=vn-vn2_1_,script=no > -device ioh3420,id=root.3,chassis=3 -device > virtio-net-pci,netdev=vn-vn2_1_,bus=root.3,ats=on,mq=on,vectors=6,mac=DE:AD:02:88:10:37,id=vn-vn2_1__dev > -netdev tap,vhost=on,queues=2,ifname=vn-vn92_1_,downscript=no,id=vn-vn92_1_,script=no > -device ioh3420,id=root.4,chassis=4 -device > virtio-net-pci,mac=DE:AD:02:88:10:38,netdev=vn-vn92_1_,bus=root.4,ats=on,mq=on,vectors=6,id=vn-vn92_1__dev > -netdev tap,vhost=on,queues=2,ifname=vn-vn93_1_,downscript=no,id=vn-vn93_1_,script=no > -device ioh3420,id=root.5,chassis=5 -device > virtio-net-pci,mac=DE:AD:02:88:10:39,netdev=vn-vn93_1_,bus=root.5,ats=on,mq=on,vectors=6,id=vn-vn93_1__dev > -vnc :16,websocket=15916 -qmp tcp:127.0.0.1:12001,server,nowait > -chardev socket,id=charmonitor,path=/tmp/mon.12001,server,nowait -mon > chardev=charmonitor,id=monitor -cdrom > /var/venom/cloud_init/0fc91c66-f0b1-11e7-acf4-525400123456.iso -device > vfio-pci,host=0000:04:10.0 -drive > file=/var/venom/instance_repo/test.img,if=none,id=drive-virtio-disk0,format=raw,aio=native,cache=none > -balloon none -device > virtio-blk-pci,scsi=off,iothread=iothread0,drive=drive-virtio-disk0,id=virtio-disk0,bus=root.1,ats=on,bootindex=1 > > > When you say that iommu=pt is a workaround, is that done in the host or > > guest? I don't see why it would particularly matter for either since > > the device is removed from the passthrough domain for assignment to the > > guest and for use in guest userspace with vfio/dpdk, so long as you're > > not somehow using no-iommu in the guest even though you have an iommu. > > I am using iommu in guest. Guest kernel is compiled with following > config, I don't think there is a way for me to use no-iommu + vfio > inside guest. With iommu=pt I can use 'igb_uio' dpdk driver which > doesn't go through dma remapping, however, I want iommu enabled and > use vfio-pci. Ok, so iommu=pt is in the guest, but it's a bit of a red herring because at the same time you're switching to igb_uio in the guest. In that case all of guest memory is mapped through the IOMMU for the device, so it doesn't disprove anything, it's a completely different operating model. When using vfio-pci the device operates in a separate IOMMU domain and it's the guest driver's responsibility to program every DMA address that the device needs to access through the vfio IOMMU interface. If the DPDK driver is not mapping IOVA 0x33a128000 for the device, we can stop right there, this is a driver bug. If DPDK is mapping that address, then maybe something bad is happening in translating that programming down through the emulated IOMMU, through the vfio IOMMU interface, and down to real hardware. Thanks, Alex > > CONFIG_GART_IOMMU=y > # CONFIG_CALGARY_IOMMU is not set > CONFIG_IOMMU_HELPER=y > CONFIG_VFIO_IOMMU_TYPE1=m > # CONFIG_VFIO_NOIOMMU is not set > CONFIG_IOMMU_API=y > CONFIG_IOMMU_SUPPORT=y > # Generic IOMMU Pagetable Support > CONFIG_IOMMU_IOVA=y > CONFIG_AMD_IOMMU=y > CONFIG_AMD_IOMMU_V2=m > CONFIG_INTEL_IOMMU=y > CONFIG_INTEL_IOMMU_SVM=y > # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set > CONFIG_INTEL_IOMMU_FLOPPY_WA=y > # CONFIG_IOMMU_DEBUG is not set > # CONFIG_IOMMU_STRESS is not set > > 'lscpu' on my system > > lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 56 > On-line CPU(s) list: 0-27 > Off-line CPU(s) list: 28-55 > Thread(s) per core: 1 > Core(s) per socket: 14 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 63 > Model name: Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz > Stepping: 2 > CPU MHz: 1771.240 > CPU max MHz: 3000.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 3999.80 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 35840K > NUMA node0 CPU(s): 0-13 > NUMA node1 CPU(s): 14-27 > > > > > > What makes the DPDK folks confident that this isn't a driver bug? > > AFAICT, a stray read from the driver would generate exactly this sort > > of log. It's also possible that VT-d emulation hit a bug and didn't > > map this page correctly, but what's so unique about this page? (Cc > > PeterX). Thanks, > > > > CC'd Anatoly (DPDK+vIOMMU engineer who has been helping me on dpdk > mailing list with this issue). > > Thanks. > > > Alex > > > >> (6) dmesg on guest > >> # dmesg | grep -e DMAR -e IOMMU > >> [ 0.000000] ACPI: DMAR 0x000000007FFE201D 000050 (v01 BOCHS > >> BXPCDMAR 00000001 BXPC 00000001) > >> [ 0.000000] DMAR: IOMMU enabled > >> [ 1.387988] DMAR: Host address width 39 > >> [ 1.389203] DMAR: DRHD base: 0x000000fed90000 flags: 0x1 > >> [ 1.390692] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap > >> 12008c22260286 ecap f00f5e > >> [ 1.393099] DMAR: ATSR flags: 0x1 > >> [ 1.394257] DMAR-IR: IOAPIC id 0 under DRHD base 0xfed90000 IOMMU 0 > >> [ 1.395891] DMAR-IR: Queued invalidation will be enabled to support > >> x2apic and Intr-remapping. > >> [ 1.400856] DMAR-IR: Enabled IRQ remapping in x2apic mode > >> [ 3.719211] DMAR: No RMRR found > >> [ 3.729983] DMAR: dmar0: Using Queued invalidation > >> [ 3.731395] DMAR: Setting RMRR: > >> [ 3.732467] DMAR: Prepare 0-16MiB unity mapping for LPC > >> [ 3.734099] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 > >> - 0xffffff] > >> [ 4.802391] DMAR: Intel(R) Virtualization Technology for Directed I/O > >> > >> Thanks. > >