> -----Original Message----- > From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > Sent: Tuesday, October 17, 2023 10:06 PM > To: Li, Meng <Meng.Li@xxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>; Ingo Molnar > <mingo@xxxxxxxxxx>; USB mailing list <linux-usb@xxxxxxxxxxxxxxx>; > Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>; linux-rt-users <linux-rt- > users@xxxxxxxxxxxxxxx> > Subject: Re: USB: add check to detect host controller hardware removal > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > > On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote: > > I did some debugs on my side. > > Firstly, the local_irq_disable_nort() function had been removed from latest > RT kernel. > > What's in the RT kernel doesn't matter here, because the code you're patching > belongs to the vanilla kernel. > > > Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: > add check to detect host controller hardware removal") is no longer useful. > > Because the function usb_remove_hcd() is invoked from xhci_pci_remove() > of file xhci-pci.c in advance. > > What about for non-xHCI controllers? > I will try non-xHCI controllers in later if I can find out one on my side. > > I am trying to fix this issue with getting register status directly without > local_irq_disable(). > > Were you able to locate the original bug report? > This is original bug report https://bugzilla.redhat.com/show_bug.cgi?id=579093 my latest debug information as below: when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register. Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP Modules linked in: CPU: 3 PID: 329 Comm: usb-storage Tainted: G W 6.1.53-rt10-yocto-preempt-rt #1 Hardware name: LS1043A RDB Board (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : xhci_ring_ep_doorbell+0x78/0x120 lr : xhci_queue_bulk_tx+0x3b0/0x8a4 sp : ffff80000b0e3960 x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100 x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100 x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000 x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002 x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50 x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000 x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290 Call trace: xhci_ring_ep_doorbell+0x78/0x120 xhci_queue_bulk_tx+0x3b0/0x8a4 xhci_urb_enqueue+0x274/0x510 usb_hcd_submit_urb+0xc0/0x8b0 usb_submit_urb+0x29c/0x5c0 usb_stor_msg_common+0x9c/0x190 usb_stor_bulk_transfer_buf+0x58/0x110 usb_stor_Bulk_transport+0xdc/0x380 usb_stor_invoke_transport+0x40/0x530 usb_stor_transparent_scsi_command+0x18/0x24 usb_stor_control_thread+0x20c/0x2a0 kthread+0x12c/0x130 ret_from_fork+0x10/0x20 Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021) ---[ end trace 0000000000000000 ]--- Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally. In this situation, if remove the pcie device with below command # echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove The code will hang at the xhci->lock in xhci_urb_dequeue() function. Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(), there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status); although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed. Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed. In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware. so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver. Thanks, Limeng > Alan Stern