On Thu, 2013-03-21 at 06:55 -0700, Ganesh Narayanaswamy wrote: > Hi Alex, > > Yes. They are PCIe devices which expose the PCIe functionality: > > -bash-4.1# lspci -vv -s 04:00 > …. > Capabilities: [ac] Express (v2) Endpoint, MSI 00 > > -bash-4.1# lspci -vv -s 03:00 > …. > Capabilities: [80] Express (v2) Endpoint, MSI 00 Ok, so we're not hitting the obvious problem that pci_find_upstream_pcie_bridge thinks we're starting at a legacy PCI device and expects there to be a PCIe-to-PCI bridge. What about the PLX switch ports, do they all have express capabilities? Perhaps you can provide lspci -vvv for the hierarchy to your FPGA device and just exclude or obfuscate the FPGA devices themselves if they're somehow too secret to that we could learn something about them from config space (unlikely). Do the FPGA devices support some form of reset, either express FLR, AF FLR, or do a soft reset on D3hot->D0? Are there any dmesg entries prior to the crash? If KVM attempts to reset the device via a secondary bus reset on the downstream switch port and that triggers a surprise hotplug things can get broken fast. The downstream ports can be unbound from pciehp if this is the problem. > Is there any dependency issue here ? Does KVM expect the downstream ports of the PCIe switch also expected to be passed through ? No, switch ports and bridges should never be attached to the guest. Is there some reason you're using -M q35? It's still a bit fragile for device assignment at this point. Have you tried vfio-pci for doing the assignment? Thanks, Alex > On Mar 20, 2013, at 7:41 PM, Alex Williamson wrote: > > > On Tue, 2013-03-19 at 17:09 -0700, Ganesh Narayanaswamy wrote: > >> Hi Alex, > >> > >> Thanks for your reply. The pci devices in question are proprietary FPGAs. Here is the lspci -tv output: > >> > >> -bash-4.1# lspci -tv > >> -[0000:00]-+-00.0 Intel Corporation Sandy Bridge DRAM Controller > >> +-01.0-[01-04]----00.0-[02-04]--+-01.0-[03]----00.0 Broadcom Corporation Device b850 > >> | \-02.0-[04]----00.0 Broadcom Corporation Device b850 > >> +-01.1-[05]-- > >> +-06.0-[06]--+-00.0 Intel Corporation Device 0434 > >> | +-00.1 Intel Corporation Device 0438 > >> | +-00.2 Intel Corporation Device 0438 > >> | +-00.3 Intel Corporation Device 0436 > >> | \-00.4 Intel Corporation Device 0436 > >> +-1d.0 Intel Corporation Device 2334 > >> +-1f.0 Intel Corporation Device 2310 > >> +-1f.2 Intel Corporation Device 2323 > >> +-1f.3 Intel Corporation Device 2330 > >> +-1f.4 Intel Corporation Device 2331 > >> +-1f.6 Intel Corporation Device 2332 > >> \-1f.7 Intel Corporation Device 2360 > >> > >> My qemu command line is as follows: > >> > >> qemu-system-x86_64 -M q35 --enable-kvm -m 2048 -nographic -vga std > >> -usb -drive file=<IMG file>,if=none,id=drive-sata-disk0,format=raw > >> -device ahci,id=ahci -device > >> ide-drive,bus=ahci.0,drive=drive-sata-disk0,id=sata-disk0,bootindex=1 > >> -device pci-assign,host=04:00.0 -device pci-assign,host=03:00.0 > >> > >> > >> The PCIe bridge is a PLX 8613 device: > >> > >> 01:00.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) > >> 02:01.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) > >> 02:02.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) > >> > >> As shown by the lspci -tv output, each of the PCI device being passed > >> through is connected to one of the downstream ports of the PLX PCI > >> bridge. > > > > Are your FPGAs actually PCIe devices (they must be because they connect > > to a PCIe switch) that do not expose a PCIe capability? For example, > > lspci -v: > > > > Capabilities: [e0] Express Endpoint, MSI 00 > > > > If so, they're in violation of the PCI Express specification and likely > > the cause of this problem. Thanks, > > > > Alex > > > >> On Mar 19, 2013, at 3:28 PM, Alex Williamson wrote: > >> > >>> On Tue, 2013-03-19 at 13:30 -0700, Ganesh Narayanaswamy wrote: > >>>> Hi, > >>>> > >>>> I am running qemu with kvm and VT-d enabled and a couple of PCI > >>>> devices assigned to the guest VM. Both host and guest are running > >>>> linux 2.6 kernel. > >>>> > >>>> The passthrough works fine, but when I exit the VM, the host kernel > >>>> crashes with the following backtrace: > >>>> > >>>> <4>[ 5569.836893] Process qemu-system-x86 (pid: 2925, threadinfo ffff8801f5f40000, task ffff88024fa28720) > >>>> <0>[ 5569.944946] Stack: > >>>> <4>[ 5569.968845] ffff8801f5f41aa8 ffffffff811a45fb ffff88024f04b680 ffff88024f049980 > >>>> <4>[ 5570.057156] ffff88024f04b680 ffff88024f049988 ffff8801f5f41b08 ffffffff811a6371 > >>>> <4>[ 5570.145470] ffff8801f5f41ad8 ffffffff81391045 0000000000000246 ffff88024f049990 > >>>> <0>[ 5570.233785] Call Trace: > >>>> <4>[ 5570.262880] [<ffffffff811a45fb>] iommu_detach_dependent_devices+0x25/0x91 > >>>> <4>[ 5570.344958] [<ffffffff811a6371>] vm_domain_exit+0xf8/0x28b > >>>> <4>[ 5570.411457] [<ffffffff81391045>] ? sub_preempt_count+0x92/0xa6 > >>>> <4>[ 5570.482106] [<ffffffff811a651a>] intel_iommu_domain_destroy+0x16/0x18 > >>>> <4>[ 5570.560030] [<ffffffff811fb5ea>] iommu_domain_free+0x16/0x22 > >>>> <4>[ 5570.628611] [<ffffffffa0006261>] kvm_iommu_unmap_guest+0x22/0x28 [kvm] > >>>> <4>[ 5570.707570] [<ffffffffa0009b7b>] kvm_arch_destroy_vm+0x19/0x12a [kvm] > >>>> <4>[ 5570.785492] [<ffffffffa0002614>] kvm_put_kvm+0xe6/0x129 [kvm] > >>>> <4>[ 5570.855102] [<ffffffffa0002eb3>] kvm_vcpu_release+0x13/0x17 [kvm] > >>>> <4>[ 5570.928867] [<ffffffff8109cdfc>] fput+0x117/0x1be > >>>> <4>[ 5570.986013] [<ffffffff8109a147>] filp_close+0x63/0x6d > >>>> <4>[ 5571.047314] [<ffffffff810342dd>] put_files_struct+0x6f/0xda > >>>> <4>[ 5571.114845] [<ffffffff8103438e>] exit_files+0x46/0x4e > >>>> <4>[ 5571.176145] [<ffffffff81035b3d>] do_exit+0x1fc/0x681 > >>>> <4>[ 5571.236416] [<ffffffffa000dedc>] ? kvm_arch_vcpu_ioctl_run+0xc2d/0xc55 [kvm] > >>>> <4>[ 5571.321605] [<ffffffff8138cc41>] ? __mutex_lock_slowpath+0x26c/0x294 > >>>> <4>[ 5571.398490] [<ffffffff81036034>] do_group_exit+0x72/0x9a > >>>> <4>[ 5571.462907] [<ffffffff8103fec9>] get_signal_to_deliver+0x331/0x350 > >>>> <4>[ 5571.537719] [<ffffffff81001f0f>] do_signal+0x6d/0x69a > >>>> <4>[ 5571.599013] [<ffffffff811da1fc>] ? put_ldisc+0x92/0x97 > >>>> <4>[ 5571.661353] [<ffffffff810a95ea>] ? do_vfs_ioctl+0x527/0x576 > >>>> <4>[ 5571.728887] [<ffffffff81002563>] do_notify_resume+0x27/0x51 > >>>> <4>[ 5571.796419] [<ffffffff810a968c>] ? sys_ioctl+0x53/0x65 > >>>> <4>[ 5571.858758] [<ffffffff81002b9b>] int_signal+0x12/0x17 > >>>> <0>[ 5571.920058] Code: 48 85 d2 0f 95 c0 c9 c3 55 80 7f 4a 00 48 89 f8 48 89 e5 75 46 31 d2 48 8b 40 10 48 83 78 10 00 75 05 48 89 d0 eb 36 48 8b 40 38 <80> 78 4a 00 48 89 c2 74 e3 80 78 4b 07 74 23 80 3d 86 b5 5a 00 > >>>> <1>[ 5572.145516] RIP [<ffffffff81197f8c>] pci_find_upstream_pcie_bridge+0x23/0x57 > >>>> <4>[ 5572.230712] RSP <ffff8801f5f41a78> > >>>> > >>>> The two PCI devices in question are behind a PCIe bridge which is > >>>> connected to the rootport. The crash seems to be happening when > >>>> cleaning up the PCIe tree of the passed-through PCI devices. I tried > >>>> passing through the downstream ports of the bridge as well, but that > >>>> is not supported by qemu. > >>>> > >>>> Am I doing something wrong/unexpected here ? Any help in understanding > >>>> this issue will help me fix the issue properly. > >>> > >>> Please provide 'sudo lspci -vvv' from the host and the qemu commandline > >>> you're using. Is the bridge by chance an asmedia device? Thanks, > >>> > >>> Alex > >>> > >> > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html