Re: [BUG] USB xHCI driver NULL pointer dereference

"Karel Balej" <balejk@xxxxxxxxx> · Fri, 16 Aug 2024 09:35:13 +0200

Mathias Nyman, 2024-08-15T16:10:32+03:00:
> On 14.8.2024 16.28, Mathias Nyman wrote:
> > On 13.8.2024 14.49, Mathias Nyman wrote:
> >> On 11.8.2024 1.11, Karel Balej wrote:
> >>> Hello,
> >>>
> >>> my machine crashed twice in the past week, the second time I have been
> >>> able to recover the log output (including the stack trace run through
> >>> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
> >>> driver:
> >
> >>
> >> You have a unlucky setup here.
> >> This could only happen when a full speed device fails enumeration while connected to a
> >> Pantherpoint xHC.
> >>
> >> Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and
> >> calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it
> >> after a failed  address device attempt when usb core re-inits endpoint 0 before retry.
> >> At this point the xhci side of the device isn't properly allocated or set up so
> >> we hit a NULL pointer dereference.
> >>
> >> I'll look into it more.
> > 
> > The following code should resolve this issue, any chance you could try it out?
>
> I was able to trigger this myself by forcing XHCI_SW_BW_CHECKING and faking failure on
> address device command:
>
> [  270.538134] usb 3-6: new full-speed USB device number 3 using xhci_hcd
> [  270.670313] xhci_hcd 0000:00:14.0: Faking a Device not respoinding to setup address
> [  270.886142] usb 3-6: device not accepting address 3, error -71
> [  270.892091] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  270.899034] #PF: supervisor read access in kernel mode
> [  270.904150] #PF: error_code(0x0000) - not-present page
> [  270.909267] PGD 0 P4D 0
> [  270.911799] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  270.916660] CPU: 3 UID: 0 PID: 301 Comm: kworker/3:2 Tainted: G        W          6.11.0-rc1+ #4291
> [  270.925651] Tainted: [W]=WARN
> [  270.928615] Workqueue: usb_hub_wq hub_event
> [  270.932787] RIP: 0010:xhci_reserve_bandwidth+0x243/0x6d0 [xhci_hcd]
>
> The codesnippet I suggested did fix the null pointer dereference.
>
> I'll turn it into a proper patch

It seems that I'm too late with a Tested-by tag but for what it's worth,
I have been running the machine with your patch the whole day yesterday
and didn't observe any regression. I have not been able to verify if it
fixed the issue as I haven't found a way to deliberately trigger it, but
it seems that you were able to do that.

Thank you very much for looking into this.

Kind regards,
K. B.