Mathias Nyman, 2024-08-15T16:10:32+03:00: > On 14.8.2024 16.28, Mathias Nyman wrote: > > On 13.8.2024 14.49, Mathias Nyman wrote: > >> On 11.8.2024 1.11, Karel Balej wrote: > >>> Hello, > >>> > >>> my machine crashed twice in the past week, the second time I have been > >>> able to recover the log output (including the stack trace run through > >>> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI > >>> driver: > > > >> > >> You have a unlucky setup here. > >> This could only happen when a full speed device fails enumeration while connected to a > >> Pantherpoint xHC. > >> > >> Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and > >> calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it > >> after a failed address device attempt when usb core re-inits endpoint 0 before retry. > >> At this point the xhci side of the device isn't properly allocated or set up so > >> we hit a NULL pointer dereference. > >> > >> I'll look into it more. > > > > The following code should resolve this issue, any chance you could try it out? > > I was able to trigger this myself by forcing XHCI_SW_BW_CHECKING and faking failure on > address device command: > > [ 270.538134] usb 3-6: new full-speed USB device number 3 using xhci_hcd > [ 270.670313] xhci_hcd 0000:00:14.0: Faking a Device not respoinding to setup address > [ 270.886142] usb 3-6: device not accepting address 3, error -71 > [ 270.892091] BUG: kernel NULL pointer dereference, address: 0000000000000008 > [ 270.899034] #PF: supervisor read access in kernel mode > [ 270.904150] #PF: error_code(0x0000) - not-present page > [ 270.909267] PGD 0 P4D 0 > [ 270.911799] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 270.916660] CPU: 3 UID: 0 PID: 301 Comm: kworker/3:2 Tainted: G W 6.11.0-rc1+ #4291 > [ 270.925651] Tainted: [W]=WARN > [ 270.928615] Workqueue: usb_hub_wq hub_event > [ 270.932787] RIP: 0010:xhci_reserve_bandwidth+0x243/0x6d0 [xhci_hcd] > > The codesnippet I suggested did fix the null pointer dereference. > > I'll turn it into a proper patch It seems that I'm too late with a Tested-by tag but for what it's worth, I have been running the machine with your patch the whole day yesterday and didn't observe any regression. I have not been able to verify if it fixed the issue as I haven't found a way to deliberately trigger it, but it seems that you were able to do that. Thank you very much for looking into this. Kind regards, K. B.