On 11.8.2024 1.11, Karel Balej wrote:
Hello, my machine crashed twice in the past week, the second time I have been able to recover the log output (including the stack trace run through scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI driver: [44193.556677] usb 2-1-port5: disabled by hub (EMI?), re-enabling... [44193.556692] usb 2-1.5: USB disconnect, device number 6 [44193.558532] cdc_ncm 2-1.5:1.0 enp0s29u1u5: unregister 'cdc_ncm' usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP) [44193.739545] usb 2-1.5: new high-speed USB device number 7 using ehci-pci [44193.819628] usb 2-1.5: New USB device found, idVendor=18d1, idProduct=d001, bcdDevice= 6.10 [44193.819637] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [44193.819641] usb 2-1.5: Product: Samsung Galaxy Core Prime VE LTE [44193.819644] usb 2-1.5: Manufacturer: Samsung [44193.819646] usb 2-1.5: SerialNumber: postmarketOS [44193.842472] cdc_ncm 2-1.5:1.0: MAC-Address: [...] [44193.842770] cdc_ncm 2-1.5:1.0 usb0: register 'cdc_ncm' at usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP), [...] [44193.845829] cdc_ncm 2-1.5:1.0 enp0s29u1u5: renamed from usb0 [46253.017991] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [46709.344533] usb 3-1: new full-speed USB device number 3 using xhci_hcd [46709.458560] usb 3-1: device descriptor read/64, error -71 [46709.679562] usb 3-1: device descriptor read/64, error -71 [46709.895544] usb 3-1: new full-speed USB device number 4 using xhci_hcd [46710.009563] usb 3-1: device descriptor read/64, error -71 [46710.231579] usb 3-1: device descriptor read/64, error -71 [46710.333629] usb usb3-port1: attempt power cycle [46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd [46710.713699] usb 3-1: Device not responding to setup address. [46710.917684] usb 3-1: Device not responding to setup address. [46711.125536] usb 3-1: device not accepting address 5, error -71 [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008 [46711.125600] #PF: supervisor read access in kernel mode [46711.125603] #PF: error_code(0x0000) - not-present page [46711.125606] PGD 0 P4D 0 [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1 [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-D3H, BIOS F18 08/21/2012 [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore] [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
Thanks for the report. You have a unlucky setup here. This could only happen when a full speed device fails enumeration while connected to a Pantherpoint xHC. Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it after a failed address device attempt when usb core re-inits endpoint 0 before retry. At this point the xhci side of the device isn't properly allocated or set up so we hit a NULL pointer dereference. I'll look into it more. Thanks Mathias