Re: pci_enable_msix() fails with ENOMEM/EINVAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2012-12-12 at 15:42 +0200, Alex Lyakas wrote:
> Hi Alex,
> The kernel am using is stock Ubuntu-Precise (as I mentioned) 
> 3.2.0-29-generic #46.
> 
> I debugged deeper in pci_enable_msix() path, and I believe I have tracked 
> down the source of the different errors I am seeing. The issue I see is that 
> pci_find_capability() *sometimes* fails to find PCI_CAP_ID_MSIX capability. 
> I tracked this down to __pci_find_next_cap_ttl() not being able to find the 
> needed cap and returns 0.

Hmm, perhaps a config space access race?  Everything seems properly
locked on my kernel.

> This leads to three different errors I saw:
> 
> Case 1: pci_enable_msix() calls pci_msi_check_device(dev, nvec, 
> PCI_CAP_ID_MSIX), which calls pci_find_capability() and further returns 
> EINVAL
> 
> Case 2: pci_enable_msix() calls pci_msix_table_size(), which calls 
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which returns 0. As a result, 
> pci_msix_table_size() returns 0. Then the following code:
> nr_entries = pci_msix_table_size(dev);
> if (nvec > nr_entries)
>     return nr_entries;
> causes pci_enable_msix() to return 0 to the caller, so now the caller thinks 
> that pci_enable_msix() succeeded! However, IRQs were not assigned, so 
> further call to request_threaded_irq() fails.
> (This looks like a kernel bug.)
>
> Case 3: pci_enable_msix() calls msix_capability_init(), which calls 
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which fails and returns pos==0.
> So now msix_capability_init() does not check the return value and proceeds 
> with setting things up, and eventually fails in call to msix_map_region() => 
> ioremap_nocache(), which returns NULL. So msix_capability_init() returns 
> ENOMEM and further pci_enable_msix() returns ENOMEM.
> (So this is another kernel bug of not checking return value of 
> pci_find_capability(dev, PCI_CAP_ID_MSIX).)

Case 1 is the one we expect to hit.  Case 2 & 3 only happen on this path
if pci_find_capability returns non-deterministic results, so their
behavior may be reasonable.

> So bottom line is that different failures I saw stem from the same issue of 
> (sometimes) not finding PCI_CAP_ID_MSIX. It always happens on the second 
> call to KVM_ASSIGN_DEV_IRQ. Why are two calls required? For example, the 
> below trace shows the first call succeeding and setting up interrupts 
> properly, and the second call failing:
> 
> kernel: [ 4851.332462] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.333553] pci-stub 0000:03:12.6: irq 257 for MSI/MSI-X
> kernel: [ 4851.333573] pci-stub 0000:03:12.6: irq 258 for MSI/MSI-X
> kernel: [ 4851.333591] pci-stub 0000:03:12.6: irq 259 for MSI/MSI-X
> kernel: [ 4851.333600] kvm-7636: assigned_device_enable_host_msix(): call 
> request_threaded_irq() assigned_dev_id=918 #0: [v=257 e=0]
> kernel: [ 4851.333845] kvm-7636: assigned_device_enable_host_msix(): call 
> request_threaded_irq() assigned_dev_id=918 #1: [v=258 e=1]
> kernel: [ 4851.334055] kvm-7636: assigned_device_enable_host_msix(): call 
> request_threaded_irq() assigned_dev_id=918 #2: [v=259 e=2]
> kernel: [ 4851.334241] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 OK
> kernel: [ 4851.396339] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.396353] kvm-???: pci_find_capability 
> __pci_find_next_cap(devfn=150 pos=0) dev=0000:03:12.6 returned 0
> kernel: [ 4851.412754] kvm-???: msix_map_region ioremap_nocache(0x100000, 
> nr_entries=2048 PCI_MSIX_ENTRY_SIZE=16 dev=0000:03:12.6 pos=0 NULL
> kernel: [ 4851.429672] kvm-???: msix_capability_init 
> msix_map_region(dev=0000:03:12.6, pos=0 control=32767) failed NULL
> kernel: [ 4851.438726] kvm-???: pci_enable_msix(0000:03:12.6) 
> msix_capability_init failed, err=-12
> kernel: [ 4851.465439] kvm-7636: assigned_device_enable_host_msix() 
> pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6 
> err=-12
> kernel: [ 4851.484072] kvm-7636: assigned_device_enable_host_msix() 
> assigned_dev_id=918 #0: [v=0 e=0]
> kernel: [ 4851.484076] kvm-7636: assigned_device_enable_host_msix() 
> assigned_dev_id=918 #1: [v=0 e=1]
> kernel: [ 4851.484081] kvm-7636: assigned_device_enable_host_msix() 
> assigned_dev_id=918 #2: [v=0 e=2]
> kernel: [ 4851.484086] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 
> r=-12

It's hard to say why it gets setup twice without a trace from qemu.  It
could be that the guest temporarily disabled MSI-X support and
re-enabled it, which causes a teardown and re-setup in the host.

> Do you think the issue is the second KVM call, or the issue is the HW not 
> reporting the capability sometimes?

The issue is not the second KVM call, nor do I expect that the HW is
actually failing to return the capability.  I suspect it's a race where
some concurrent access to PCI config space is moving the window register
while pci_find_capability is walking config space.  dmesg on the host
would probably tell us which config access mechanism you're using.

> Looking at the latest 3.7 kernel code sources, I see that all relevant 
> functions (pci_enable_msix, pci_find_capability and msix_capability_init) 
> would behave the same way WRT to pci_find_capability() call. And anyways, 
> moving to upstream kernel is not an option for me at this point.

You've shown that you're getting a different result out of those
functions, but not why, so it's a bit premature to ignore the fact that
the problem may already be fixed in one of the past 5 kernel release.
If it is fixed, the patch(es) can be identified and backported to your
distro kernel.  If it's not fixed, at least we're all looking at the
same source code.

> Please let me know what do you think the next step would be.

dmesg + try v3.7.  I don't think you need to try upstream qemu-kvm at
this point since you've identified suspicious behavior that userspace
shouldn't be able to cause.  Otherwise check locking around the PCI
config accesses that you're using and make sure that userspace accesses
through pci-sysfs aren't bypassing anything.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux