Re: Host kernel crash at pci_find_upstream_pcie_bridge on VM exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2013-03-21 at 06:55 -0700, Ganesh Narayanaswamy wrote:
> Hi Alex,
> 
> Yes. They are PCIe devices which expose the PCIe functionality:
> 
> -bash-4.1# lspci -vv -s 04:00
> ….
> 	Capabilities: [ac] Express (v2) Endpoint, MSI 00
> 
> -bash-4.1# lspci -vv -s 03:00
> ….
> 	Capabilities: [80] Express (v2) Endpoint, MSI 00

Ok, so we're not hitting the obvious problem that
pci_find_upstream_pcie_bridge thinks we're starting at a legacy PCI
device and expects there to be a PCIe-to-PCI bridge.  What about the PLX
switch ports, do they all have express capabilities?  Perhaps you can
provide lspci -vvv for the hierarchy to your FPGA device and just
exclude or obfuscate the FPGA devices themselves if they're somehow too
secret to that we could learn something about them from config space
(unlikely).

Do the FPGA devices support some form of reset, either express FLR, AF
FLR, or do a soft reset on D3hot->D0?  Are there any dmesg entries prior
to the crash?  If KVM attempts to reset the device via a secondary bus
reset on the downstream switch port and that triggers a surprise hotplug
things can get broken fast.  The downstream ports can be unbound from
pciehp if this is the problem.

> Is there any dependency issue here ? Does KVM expect the downstream ports of the PCIe switch also expected to be passed through ? 

No, switch ports and bridges should never be attached to the guest.  Is
there some reason you're using -M q35?  It's still a bit fragile for
device assignment at this point.  Have you tried vfio-pci for doing the
assignment?  Thanks,

Alex

> On Mar 20, 2013, at 7:41 PM, Alex Williamson wrote:
> 
> > On Tue, 2013-03-19 at 17:09 -0700, Ganesh Narayanaswamy wrote:
> >> Hi Alex,
> >> 
> >> Thanks for your reply.  The pci devices in question are proprietary FPGAs.  Here is the lspci -tv output:
> >> 
> >> -bash-4.1# lspci -tv
> >> -[0000:00]-+-00.0  Intel Corporation Sandy Bridge DRAM Controller
> >>           +-01.0-[01-04]----00.0-[02-04]--+-01.0-[03]----00.0  Broadcom Corporation Device b850
> >>           |                               \-02.0-[04]----00.0  Broadcom Corporation Device b850
> >>           +-01.1-[05]--
> >>           +-06.0-[06]--+-00.0  Intel Corporation Device 0434
> >>           |            +-00.1  Intel Corporation Device 0438
> >>           |            +-00.2  Intel Corporation Device 0438
> >>           |            +-00.3  Intel Corporation Device 0436
> >>           |            \-00.4  Intel Corporation Device 0436
> >>           +-1d.0  Intel Corporation Device 2334
> >>           +-1f.0  Intel Corporation Device 2310
> >>           +-1f.2  Intel Corporation Device 2323
> >>           +-1f.3  Intel Corporation Device 2330
> >>           +-1f.4  Intel Corporation Device 2331
> >>           +-1f.6  Intel Corporation Device 2332
> >>           \-1f.7  Intel Corporation Device 2360
> >> 
> >> My qemu command line is as follows:
> >> 
> >> qemu-system-x86_64 -M q35 --enable-kvm -m 2048 -nographic -vga std
> >> -usb -drive file=<IMG file>,if=none,id=drive-sata-disk0,format=raw
> >> -device ahci,id=ahci -device
> >> ide-drive,bus=ahci.0,drive=drive-sata-disk0,id=sata-disk0,bootindex=1
> >> -device pci-assign,host=04:00.0 -device pci-assign,host=03:00.0
> >> 
> >> 
> >> The PCIe bridge is a PLX 8613 device:
> >> 
> >> 01:00.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
> >> 02:01.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
> >> 02:02.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
> >> 
> >> As shown by the lspci -tv output, each of the PCI device being passed
> >> through is connected to one of the downstream ports of the PLX PCI
> >> bridge.
> > 
> > Are your FPGAs actually PCIe devices (they must be because they connect
> > to a PCIe switch) that do not expose a PCIe capability?  For example,
> > lspci -v:
> > 
> > 	Capabilities: [e0] Express Endpoint, MSI 00
> > 
> > If so, they're in violation of the PCI Express specification and likely
> > the cause of this problem.  Thanks,
> > 
> > Alex
> > 
> >> On Mar 19, 2013, at 3:28 PM, Alex Williamson wrote:
> >> 
> >>> On Tue, 2013-03-19 at 13:30 -0700, Ganesh Narayanaswamy wrote:
> >>>> Hi,
> >>>> 
> >>>> I am running qemu with kvm and VT-d enabled and a couple of PCI
> >>>> devices assigned to the guest VM. Both host and guest are running
> >>>> linux 2.6 kernel.  
> >>>> 
> >>>> The passthrough works fine, but when I exit the VM, the host kernel
> >>>> crashes with the following backtrace:
> >>>> 
> >>>> <4>[ 5569.836893] Process qemu-system-x86 (pid: 2925, threadinfo ffff8801f5f40000, task ffff88024fa28720)
> >>>> <0>[ 5569.944946] Stack:
> >>>> <4>[ 5569.968845]  ffff8801f5f41aa8 ffffffff811a45fb ffff88024f04b680 ffff88024f049980
> >>>> <4>[ 5570.057156]  ffff88024f04b680 ffff88024f049988 ffff8801f5f41b08 ffffffff811a6371
> >>>> <4>[ 5570.145470]  ffff8801f5f41ad8 ffffffff81391045 0000000000000246 ffff88024f049990
> >>>> <0>[ 5570.233785] Call Trace:
> >>>> <4>[ 5570.262880]  [<ffffffff811a45fb>] iommu_detach_dependent_devices+0x25/0x91
> >>>> <4>[ 5570.344958]  [<ffffffff811a6371>] vm_domain_exit+0xf8/0x28b
> >>>> <4>[ 5570.411457]  [<ffffffff81391045>] ? sub_preempt_count+0x92/0xa6
> >>>> <4>[ 5570.482106]  [<ffffffff811a651a>] intel_iommu_domain_destroy+0x16/0x18
> >>>> <4>[ 5570.560030]  [<ffffffff811fb5ea>] iommu_domain_free+0x16/0x22
> >>>> <4>[ 5570.628611]  [<ffffffffa0006261>] kvm_iommu_unmap_guest+0x22/0x28 [kvm]
> >>>> <4>[ 5570.707570]  [<ffffffffa0009b7b>] kvm_arch_destroy_vm+0x19/0x12a [kvm]
> >>>> <4>[ 5570.785492]  [<ffffffffa0002614>] kvm_put_kvm+0xe6/0x129 [kvm]
> >>>> <4>[ 5570.855102]  [<ffffffffa0002eb3>] kvm_vcpu_release+0x13/0x17 [kvm]
> >>>> <4>[ 5570.928867]  [<ffffffff8109cdfc>] fput+0x117/0x1be
> >>>> <4>[ 5570.986013]  [<ffffffff8109a147>] filp_close+0x63/0x6d
> >>>> <4>[ 5571.047314]  [<ffffffff810342dd>] put_files_struct+0x6f/0xda
> >>>> <4>[ 5571.114845]  [<ffffffff8103438e>] exit_files+0x46/0x4e
> >>>> <4>[ 5571.176145]  [<ffffffff81035b3d>] do_exit+0x1fc/0x681
> >>>> <4>[ 5571.236416]  [<ffffffffa000dedc>] ? kvm_arch_vcpu_ioctl_run+0xc2d/0xc55 [kvm]
> >>>> <4>[ 5571.321605]  [<ffffffff8138cc41>] ? __mutex_lock_slowpath+0x26c/0x294
> >>>> <4>[ 5571.398490]  [<ffffffff81036034>] do_group_exit+0x72/0x9a
> >>>> <4>[ 5571.462907]  [<ffffffff8103fec9>] get_signal_to_deliver+0x331/0x350
> >>>> <4>[ 5571.537719]  [<ffffffff81001f0f>] do_signal+0x6d/0x69a
> >>>> <4>[ 5571.599013]  [<ffffffff811da1fc>] ? put_ldisc+0x92/0x97
> >>>> <4>[ 5571.661353]  [<ffffffff810a95ea>] ? do_vfs_ioctl+0x527/0x576
> >>>> <4>[ 5571.728887]  [<ffffffff81002563>] do_notify_resume+0x27/0x51
> >>>> <4>[ 5571.796419]  [<ffffffff810a968c>] ? sys_ioctl+0x53/0x65
> >>>> <4>[ 5571.858758]  [<ffffffff81002b9b>] int_signal+0x12/0x17
> >>>> <0>[ 5571.920058] Code: 48 85 d2 0f 95 c0 c9 c3 55 80 7f 4a 00 48 89 f8 48 89 e5 75 46 31 d2 48 8b 40 10 48 83 78 10 00 75 05 48 89 d0 eb 36 48 8b 40 38 <80> 78 4a 00 48 89 c2 74 e3 80 78 4b 07 74 23 80 3d 86 b5 5a 00 
> >>>> <1>[ 5572.145516] RIP  [<ffffffff81197f8c>] pci_find_upstream_pcie_bridge+0x23/0x57
> >>>> <4>[ 5572.230712]  RSP <ffff8801f5f41a78>
> >>>> 
> >>>> The two PCI devices in question are behind a PCIe bridge which is
> >>>> connected to the rootport. The crash seems to be happening when
> >>>> cleaning up the PCIe tree of the passed-through PCI devices.  I tried
> >>>> passing through the downstream ports of the bridge as well, but that
> >>>> is not supported by qemu.
> >>>> 
> >>>> Am I doing something wrong/unexpected here ? Any help in understanding
> >>>> this issue will help me fix the issue properly.
> >>> 
> >>> Please provide 'sudo lspci -vvv' from the host and the qemu commandline
> >>> you're using.  Is the bridge by chance an asmedia device?  Thanks,
> >>> 
> >>> Alex
> >>> 
> >> 
> > 
> > 
> > 
> 




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux