On Tue, Apr 02, 2024, Mathias Nyman wrote: > Hi Thinh > > On 2.4.2024 3.50, Thinh Nguyen wrote: > > Hi Mathias, > > > > We're getting a NULL pointer dereference bug for this patch. To > > reproduce this, just unload and reload the xhci driver while a device is > > connected. It may take a few times to hit the issue. > > > > [ 1515.737623] BUG: kernel NULL pointer dereference, address: 000000000000000c > > [ 1515.737629] #PF: supervisor read access in kernel mode > > [ 1515.737631] #PF: error_code(0x0000) - not-present page > > [ 1515.737633] PGD 0 P4D 0 > > [ 1515.737636] Oops: 0000 [#1] PREEMPT SMP NOPTI > > [ 1515.737639] CPU: 1 PID: 2905 Comm: kworker/1:0 Not tainted 6.9.0-rc2-snps-detached-HEAD-g5bab5dc780c9 #8 > > [ 1515.737642] Hardware name: System manufacturer System Product Name/PRIME Q370M-C, BIOS 2605 10/16/2019 > > [ 1515.737643] Workqueue: usb_hub_wq hub_event > > [ 1515.737648] RIP: 0010:trace_event_raw_event_xhci_log_free_virt_dev+0x64/0xb7 [xhci_hcd] > > [ 1515.737667] Code: 85 c0 74 59 48 89 58 08 48 8b 53 18 48 89 e7 48 8b 52 10 48 89 50 18 48 8b 53 10 48 8b 52 10 48 89 50 10 48 8b 93 90 11 00 00 <8b> 52 0c 89 50 20 48 8b 93 90 11 00 00 8b 52 08 89 50 24 8b 93 b0 > > [ 1515.737669] RSP: 0018:ffffac30c0bdbce8 EFLAGS: 00010086 > > [ 1515.737671] RAX: ffff99e0003309b8 RBX: ffff99e040fb4000 RCX: ffff99e0003309b4 > > [ 1515.737673] RDX: 0000000000000000 RSI: 0000000000000616 RDI: ffffac30c0bdbce8 > > [ 1515.737729] RBP: ffff99e007b58a20 R08: 0000000000000002 R09: 0000000000000010 > > [ 1515.737731] R10: 000000014d33cf30 R11: 0000000000000020 R12: 0000000000000001 > > [ 1515.737733] R13: ffff99e008c9a294 R14: ffff99e008c9a258 R15: ffff99e004ca5000 > > [ 1515.737734] FS: 0000000000000000(0000) GS:ffff99e35bc40000(0000) knlGS:0000000000000000 > > [ 1515.737736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1515.737738] CR2: 000000000000000c CR3: 00000001024d6004 CR4: 00000000003706f0 > > [ 1515.737740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 1515.737741] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 1515.737743] Call Trace: > > [ 1515.737746] <TASK> > > [ 1515.737748] ? __die_body+0x1a/0x5c > > [ 1515.737751] ? page_fault_oops+0x2ea/0x380 > > [ 1515.737806] ? sched_clock+0x10/0x23 > > [ 1515.737808] ? trace_clock_local+0x10/0x23 > > [ 1515.737812] ? exc_page_fault+0xfe/0x11e > > [ 1515.737815] ? asm_exc_page_fault+0x26/0x30 > > [ 1515.737818] ? trace_event_raw_event_xhci_log_free_virt_dev+0x64/0xb7 [xhci_hcd] > > [ 1515.737832] ? trace_event_raw_event_xhci_log_free_virt_dev+0x39/0xb7 [xhci_hcd] > > [ 1515.737844] xhci_free_virt_device+0x76/0x1d8 [xhci_hcd] > > [ 1515.737856] xhci_free_dev+0x111/0x12f [xhci_hcd] > > [ 1515.737867] hub_event+0xca6/0x137e > > Thanks for the report, and for debugging this. > I was able to reproduce it myself using your steps. > > This triggers if xhci tracing is enabled and usb device is freed before it's addressed > as vdev->rhub_port is only set during addressing. > > > > > > > To capture the tracepoints and avoid the NULL pointer, I made this > > change: > > > > I think we could skip printing the hcd_portnum and hw_portnum during > xhci_free_virt_dev() tracing. I haven't really found them useful. > > It would make sense to print the slot_id instead. > > how about this: > > diff --git a/drivers/usb/host/xhci-trace.h b/drivers/usb/host/xhci-trace.h > index 1740000d54c2..5762564b9d73 100644 > --- a/drivers/usb/host/xhci-trace.h > +++ b/drivers/usb/host/xhci-trace.h > @@ -172,8 +172,7 @@ DECLARE_EVENT_CLASS(xhci_log_free_virt_dev, > __field(void *, vdev) > __field(unsigned long long, out_ctx) > __field(unsigned long long, in_ctx) > - __field(int, hcd_portnum) > - __field(int, hw_portnum) > + __field(int, slot_id) > __field(u16, current_mel) > ), > @@ -181,13 +180,12 @@ DECLARE_EVENT_CLASS(xhci_log_free_virt_dev, > __entry->vdev = vdev; > __entry->in_ctx = (unsigned long long) vdev->in_ctx->dma; > __entry->out_ctx = (unsigned long long) vdev->out_ctx->dma; > - __entry->hcd_portnum = (int) vdev->rhub_port->hcd_portnum; > - __entry->hw_portnum = (int) vdev->rhub_port->hw_portnum; > + __entry->slot_id = (int) vdev->slot_id; > __entry->current_mel = (u16) vdev->current_mel; > ), > - TP_printk("vdev %p ctx %llx | %llx hcd_portnum %d hw_portnum %d current_mel %d", > - __entry->vdev, __entry->in_ctx, __entry->out_ctx, > - __entry->hcd_portnum, __entry->hw_portnum, __entry->current_mel > + TP_printk("vdev %p slot %d ctx %llx | %llx current_mel %d", > + __entry->vdev, __entry->slot_id, __entry->in_ctx, > + __entry->out_ctx, __entry->current_mel > ) > ); > > It works for me > That looks good to me. Can you submit the change? On an unrelated note, often we have to debug xHCI driver on a system with multiple xHCI controllers. I'm not sure if there's a good way to filter the xHCI tracepoints to a specific controller? I needed to print the devname for each tracepoint just to get around this, which doesn't seem like a great solution. Any idea? Thanks, Thinh