Re: NULL pointer dereference in xhci_suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



19.04.18 14:17, Mathias Nyman пише:
> On 18.04.2018 20:30, Nazar Mokrynskyi wrote:
>> Hi everyone, I'm running a VM using libvirt on kernel 4.15 (Ubuntu 18.04).
>>
>> ASMedia USB 3.1 Gen 2 controller is assigned to VM like this:
>>
>>      <hostdev mode='subsystem' type='pci' managed='yes'>
>>        <source>
>>          <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
>>        </source>
>>        <rom bar='off'/>
>>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
>>      </hostdev>
>>
>> Occasionally VM start fails with following error and I'm unable to start a VM anymore until reboot:
>>
>> [40061.976095] BUG: unable to handle kernel NULL pointer dereference at 0000000000000228
>> [40061.976107] IP: xhci_suspend+0x3a/0x490
>> [40061.976109] PGD 0 P4D 0
>> [40061.976115] Oops: 0000 [#1] SMP PTI
>> [40061.976118] Modules linked in: cpuid vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter binfmt_misc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate nls_iso8859_1 mxm_wmi snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core snd_rawmidi snd_hwdep snd_seq_device joydev input_leds intel_rapl_perf snd_pcm cdc_acm snd_timer snd mei_me soundcore mei shpchp wmi nvidia_uvm(POE) acpi_pad mac_hid sch_fq_codel lp parport bfq ip_tables
>> [40061.976185]  x_tables autofs4 btrfs xor zstd_compress raid6_pq algif_skcipher af_alg dm_crypt hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmul nvidia_drm(POE) ghash_clmulni_intel nvidia_modeset(POE) pcbc i915 nvidia(POE) aesni_intel aes_x86_64 crypto_simd i2c_algo_bit glue_helper cryptd drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp nvme pps_core drm nvme_core ahci ipmi_devintf libahci ipmi_msghandler video
>> [40061.976230] CPU: 7 PID: 29745 Comm: kworker/7:0 Tainted: P           OE    4.15.0-17-generic #18-Ubuntu
>> [40061.976232] Hardware name: Micro-Star International Co., Ltd. MS-7B45/Z370 GAMING PRO CARBON (MS-7B45), BIOS A.51 03/29/2018
>> [40061.976239] Workqueue: pm pm_runtime_work
>> [40061.976245] RIP: 0010:xhci_suspend+0x3a/0x490
>> [40061.976248] RSP: 0018:ffffa5dca0747c40 EFLAGS: 00010246
>> [40061.976252] RAX: 00000000ffffffea RBX: ffff913dc3106000 RCX: 0000000000000000
>> [40061.976254] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff913dc3106230
>> [40061.976257] RBP: ffffa5dca0747c78 R08: 0000000000000000 R09: ffffa5dca0747db8
>> [40061.976259] R10: 0000000000000000 R11: 00000000000001cc R12: ffff913dc3106000
>> [40061.976262] R13: ffff913dc3106000 R14: ffff913dc3106230 R15: ffffffffb86efaa0
>> [40061.976265] FS:  0000000000000000(0000) GS:ffff913e2e7c0000(0000) knlGS:0000000000000000
>> [40061.976268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [40061.976271] CR2: 0000000000000228 CR3: 000000031460a003 CR4: 00000000003606e0
>> [40061.976274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [40061.976276] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [40061.976278] Call Trace:
>> [40061.976286]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976292]  xhci_pci_suspend+0x5a/0xd0
>> [40061.976297]  suspend_common+0x85/0x160
>> [40061.976302]  hcd_pci_runtime_suspend+0x1b/0x50
>> [40061.976306]  pci_pm_runtime_suspend+0x64/0x180
>> [40061.976311]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976315]  __rpm_callback+0xca/0x210
>> [40061.976320]  ? __switch_to_asm+0x34/0x70
>> [40061.976324]  ? __switch_to_asm+0x40/0x70
>> [40061.976329]  rpm_callback+0x24/0x80
>> [40061.976333]  ? pci_pm_runtime_resume+0xa0/0xa0
>> [40061.976338]  rpm_suspend+0x137/0x640
>> [40061.976343]  rpm_idle+0x58/0x2a0
>> [40061.976348]  pm_runtime_work+0x92/0xa0
>> [40061.976353]  process_one_work+0x1de/0x410
>> [40061.976357]  worker_thread+0x32/0x410
>> [40061.976362]  kthread+0x121/0x140
>> [40061.976366]  ? process_one_work+0x410/0x410
>> [40061.976371]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [40061.976376]  ? do_syscall_64+0x73/0x130
>> [40061.976380]  ? SyS_exit_group+0x14/0x20
>> [40061.976383]  ret_from_fork+0x35/0x40
>> [40061.976387] Code: 41 54 53 48 83 ec 10 4c 8b 2f 41 8b 85 28 02 00 00 85 c0 0f 84 d1 01 00 00 83 f8 04 0f 85 51 04 00 00 48 8b 57 08 b8 ea ff ff ff <83> ba 28 02 00 00 04 0f 85 04 01 00 00 41 89 f6 49 89 fc e8 9e
>> [40061.976443] RIP: xhci_suspend+0x3a/0x490 RSP: ffffa5dca0747c40
>> [40061.976445] CR2: 0000000000000228
>> [40061.976449] ---[ end trace 5f17622537522ef8 ]---
>>
>> Nothing was attached to this controller when error happened.
>
> Looks like xhci controller was runtime suspended before everything was set up.
>
> Normally PCI core will enable runtime pm, forbid it, and increase usage counter (0 -> 1) before pci driver probe is called.
> xhci_pci_probe() increases usage count again (1 -> 2) before creating two hcds.
> the first hcd is created using  usb_hcd_pci_probe() which decreases usage (2 -> 1),
> xhci_pci_probe() creates and adds second hcd manually, after which it decreases usage (1 -> 0)
>
> I'm not sure how running a VM affects this, but in case usage counter isn't already increased
> before xhci_pci_probe() is called then we might runtime suspend after the first hcd is added,
> but before second one is set up.
>
> To know what is going on could you try adding the below code, and see if you get any
> new messages in dmesg.
>
>  diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 1aad89b..b16c634 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -317,6 +317,7 @@ static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  dealloc_usb2_hcd:
>         usb_hcd_pci_remove(dev);
>  put_runtime_pm:
> +       dev_info(&dev->dev, "SUSPEND: FAIL in probe, still decrease usage");
>         pm_runtime_put_noidle(&dev->dev);
>         return retval;
>  }
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index da6dbe3..98bec53 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -863,8 +863,15 @@ int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup)
>         struct usb_hcd          *hcd = xhci_to_hcd(xhci);
>         u32                     command;
>  
> -       if (!hcd->state)
> +       if (!hcd || !hcd->state) {
> +               xhci_err(xhci, "SUSPEND: FAIL No hcd or hcd->state\n");
>                 return 0;
> +       }
> +
> +       if (!xhci->shared_hcd || !xhci->shared_hcd->state) {
> +               xhci_err(xhci, "SUSPEND: FAIL No shared hcd or state\n");
> +               return 0;
> +       }
>  
>         if (hcd->state != HC_STATE_SUSPENDED ||
>                         xhci->shared_hcd->state != HC_STATE_SUSPENDED)
>
>
> Thanks
> Mathias

Thanks for quick response!

I'll recompile latest development kernel with mentioned patch and let you know when it happens again.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux