CC'ing Anatoly(DPDK/vIOMMU engineer). More information inline. Thanks Alex for your prompt response. On Fri, Feb 9, 2018 at 12:02 PM, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Fri, 9 Feb 2018 10:31:57 -0800 > Ravi Kerur <rkerur@xxxxxxxxx> wrote: > >> Hi, >> >> I am running into an issue when DPDK is started with intel_iommu=on >> via GRUB command. Problem is not seen with regular kernel driver, >> error messages show when DPDK is started and happens for both PF and >> VF interfaces. Discussing with DPDK folks it is not a DPDK issue hence >> sending an email to this list. >> >> Workaround is to use "iommu=pt" but I want iommu enabled in my setup. >> >> My understanding is that 'Address width' is reported as '46' on host >> and '39' on guest and it could be causing the problem but not certain >> so kindly let me know how to resolve this issue. >> >> I tried influencing guest address width with 'host-phys-bits=true and >> phys-bits=46' properties when instantiating a guest but it still ends >> with address width 39. >> >> I have tried 'x-aw-bits=48', still see DMAR errors. >> >> Following are the details >> >> (1) Linux kernel 4.9 (host and guest), Qemu 2.11 >> >> (2) DPDK 17.05 >> >> (3) IXGBE details >> ethtool -i enp4s0f0 (PF driver) >> driver: ixgbe >> version: 5.3.3 >> firmware-version: 0x800007b8, 1.1018.0 >> bus-info: 0000:04:00.0 >> supports-statistics: yes >> supports-test: yes >> supports-eeprom-access: yes >> supports-register-dump: yes >> supports-priv-flags: yes >> >> ethtool -i enp4s16f2 (VF driver) >> driver: ixgbevf >> version: 4.3.2 >> firmware-version: >> bus-info: 0000:04:10.2 >> supports-statistics: yes >> supports-test: yes >> supports-eeprom-access: no >> supports-register-dump: yes >> supports-priv-flags: no >> >> Bus info Device Class Description >> ========================================================= >> pci@0000:01:00.0 ens11f0 network 82599ES 10-Gigabit >> SFI/SFP+ Network Connection >> pci@0000:01:00.1 ens11f1 network 82599ES 10-Gigabit >> SFI/SFP+ Network Connection >> pci@0000:04:00.0 enp4s0f0 network 82599ES 10-Gigabit >> SFI/SFP+ Network Connection >> pci@0000:04:00.1 enp4s0f1 network 82599ES 10-Gigabit >> SFI/SFP+ Network Connection >> pci@0000:04:10.0 enp4s16 network Illegal Vendor ID >> pci@0000:04:10.2 enp4s16f2 network Illegal Vendor ID >> >> (5) Kernel dmesg on host >> >> dmesg | grep -e DMAR -e IOMMU >> [ 0.000000] ACPI: DMAR 0x000000007999BAD0 0000E0 (v01 ALASKA A M I >> 00000001 INTL 20091013) >> [ 0.000000] DMAR: IOMMU enabled >> [ 0.519368] DMAR: Host address width 46 >> [ 0.527243] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0 >> [ 0.538073] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap >> d2078c106f0466 ecap f020df >> [ 0.554253] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1 >> [ 0.565074] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap >> d2078c106f0466 ecap f020df >> [ 0.581259] DMAR: RMRR base: 0x0000007bbc6000 end: 0x0000007bbd4fff >> [ 0.593980] DMAR: ATSR flags: 0x0 >> [ 0.600802] DMAR: RHSA base: 0x000000c7ffc000 proximity domain: 0x0 >> [ 0.613532] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x1 >> [ 0.626265] DMAR-IR: IOAPIC id 3 under DRHD base 0xfbffc000 IOMMU 0 >> [ 0.639177] DMAR-IR: IOAPIC id 1 under DRHD base 0xc7ffc000 IOMMU 1 >> [ 0.652089] DMAR-IR: IOAPIC id 2 under DRHD base 0xc7ffc000 IOMMU 1 >> [ 0.664996] DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000 >> [ 0.675984] DMAR-IR: Queued invalidation will be enabled to support >> x2apic and Intr-remapping. >> [ 0.694475] DMAR-IR: Enabled IRQ remapping in x2apic mode >> [ 9.637093] DMAR: dmar1: Using Queued invalidation >> [ 9.646945] DMAR: Setting RMRR: >> [ 9.653942] DMAR: Setting identity map for device 0000:00:1d.0 >> [0x7bbc6000 - 0x7bbd4fff] >> [ 9.670513] DMAR: Prepare 0-16MiB unity mapping for LPC >> [ 9.681656] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 >> - 0xffffff] >> [ 9.696630] DMAR: Intel(R) Virtualization Technology for Directed I/O >> [ 2605.450811] DMAR: DRHD: handling fault status reg 2 >> [ 2605.450814] DMAR: [DMA Read] Request device [04:10.0] fault addr >> 33a128000 [fault reason 06] PTE Read access is not set >> [ 2607.450907] DMAR: DRHD: handling fault status reg 102 >> [ 2607.450910] DMAR: [DMA Read] Request device [04:10.0] fault addr >> 33a128000 [fault reason 06] PTE Read access is not set >> [ 4539.597735] DMAR: DRHD: handling fault status reg 202 >> [ 4539.597737] DMAR: [DMA Read] Request device [04:10.0] fault addr >> 33a128000 [fault reason 06] PTE Read access is not set > > It's a read fault to address 0x33a128000, just below 13G. I don't see > why that would have anything to do with the emulated VT-d address > width, we're well under 39bits. Is the VM configured with enough > memory that this would be a valid guest physical address? Is the VM > configured with more than 39bits of physical address space (1TB)? > Guest was instantiated with the following command. I have tried both 16G and 64G issue is seen. I have tried with/without hugepages (no -mem-path -mem-prealloc mlock) same issue is seen. /rk-qemu-2.11/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -M q35,accel=kvm,kernel-irqchip=split -object iothread,id=iothread0 -device intel-iommu,intremap=on,device-iotlb=on,caching-mode=on -cpu host -daemonize -m 16G -mem-prealloc -mem-path /dev/hugepages_1GB -realtime mlock=on -smp 14 -uuid 0fc91c66-f0b1-11e7-acf4-525400123456 -name '212748-sriov-ravi-smac-alpha-SMAC10' -device ioh3420,id=root.1,chassis=1 -device ioh3420,id=root.2,chassis=2 -netdev tap,vhost=on,queues=2,ifname=vn-vn2_1_,downscript=no,id=vn-vn2_1_,script=no -device ioh3420,id=root.3,chassis=3 -device virtio-net-pci,netdev=vn-vn2_1_,bus=root.3,ats=on,mq=on,vectors=6,mac=DE:AD:02:88:10:37,id=vn-vn2_1__dev -netdev tap,vhost=on,queues=2,ifname=vn-vn92_1_,downscript=no,id=vn-vn92_1_,script=no -device ioh3420,id=root.4,chassis=4 -device virtio-net-pci,mac=DE:AD:02:88:10:38,netdev=vn-vn92_1_,bus=root.4,ats=on,mq=on,vectors=6,id=vn-vn92_1__dev -netdev tap,vhost=on,queues=2,ifname=vn-vn93_1_,downscript=no,id=vn-vn93_1_,script=no -device ioh3420,id=root.5,chassis=5 -device virtio-net-pci,mac=DE:AD:02:88:10:39,netdev=vn-vn93_1_,bus=root.5,ats=on,mq=on,vectors=6,id=vn-vn93_1__dev -vnc :16,websocket=15916 -qmp tcp:127.0.0.1:12001,server,nowait -chardev socket,id=charmonitor,path=/tmp/mon.12001,server,nowait -mon chardev=charmonitor,id=monitor -cdrom /var/venom/cloud_init/0fc91c66-f0b1-11e7-acf4-525400123456.iso -device vfio-pci,host=0000:04:10.0 -drive file=/var/venom/instance_repo/test.img,if=none,id=drive-virtio-disk0,format=raw,aio=native,cache=none -balloon none -device virtio-blk-pci,scsi=off,iothread=iothread0,drive=drive-virtio-disk0,id=virtio-disk0,bus=root.1,ats=on,bootindex=1 > When you say that iommu=pt is a workaround, is that done in the host or > guest? I don't see why it would particularly matter for either since > the device is removed from the passthrough domain for assignment to the > guest and for use in guest userspace with vfio/dpdk, so long as you're > not somehow using no-iommu in the guest even though you have an iommu. I am using iommu in guest. Guest kernel is compiled with following config, I don't think there is a way for me to use no-iommu + vfio inside guest. With iommu=pt I can use 'igb_uio' dpdk driver which doesn't go through dma remapping, however, I want iommu enabled and use vfio-pci. CONFIG_GART_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_IOMMU_HELPER=y CONFIG_VFIO_IOMMU_TYPE1=m # CONFIG_VFIO_NOIOMMU is not set CONFIG_IOMMU_API=y CONFIG_IOMMU_SUPPORT=y # Generic IOMMU Pagetable Support CONFIG_IOMMU_IOVA=y CONFIG_AMD_IOMMU=y CONFIG_AMD_IOMMU_V2=m CONFIG_INTEL_IOMMU=y CONFIG_INTEL_IOMMU_SVM=y # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set CONFIG_INTEL_IOMMU_FLOPPY_WA=y # CONFIG_IOMMU_DEBUG is not set # CONFIG_IOMMU_STRESS is not set 'lscpu' on my system lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-27 Off-line CPU(s) list: 28-55 Thread(s) per core: 1 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz Stepping: 2 CPU MHz: 1771.240 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 3999.80 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K NUMA node0 CPU(s): 0-13 NUMA node1 CPU(s): 14-27 > > What makes the DPDK folks confident that this isn't a driver bug? > AFAICT, a stray read from the driver would generate exactly this sort > of log. It's also possible that VT-d emulation hit a bug and didn't > map this page correctly, but what's so unique about this page? (Cc > PeterX). Thanks, > CC'd Anatoly (DPDK+vIOMMU engineer who has been helping me on dpdk mailing list with this issue). Thanks. > Alex > >> (6) dmesg on guest >> # dmesg | grep -e DMAR -e IOMMU >> [ 0.000000] ACPI: DMAR 0x000000007FFE201D 000050 (v01 BOCHS >> BXPCDMAR 00000001 BXPC 00000001) >> [ 0.000000] DMAR: IOMMU enabled >> [ 1.387988] DMAR: Host address width 39 >> [ 1.389203] DMAR: DRHD base: 0x000000fed90000 flags: 0x1 >> [ 1.390692] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap >> 12008c22260286 ecap f00f5e >> [ 1.393099] DMAR: ATSR flags: 0x1 >> [ 1.394257] DMAR-IR: IOAPIC id 0 under DRHD base 0xfed90000 IOMMU 0 >> [ 1.395891] DMAR-IR: Queued invalidation will be enabled to support >> x2apic and Intr-remapping. >> [ 1.400856] DMAR-IR: Enabled IRQ remapping in x2apic mode >> [ 3.719211] DMAR: No RMRR found >> [ 3.729983] DMAR: dmar0: Using Queued invalidation >> [ 3.731395] DMAR: Setting RMRR: >> [ 3.732467] DMAR: Prepare 0-16MiB unity mapping for LPC >> [ 3.734099] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 >> - 0xffffff] >> [ 4.802391] DMAR: Intel(R) Virtualization Technology for Directed I/O >> >> Thanks. >