Re: USB transaction errors causing RCU stalls and kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4.3.2020 16.12, Oliver Neukum wrote:
> Am Mittwoch, den 04.03.2020, 14:11 +0200 schrieb Mathias Nyman:
>> On 3.3.2020 22.08, Jonas Karlsson wrote:
> 
>>
>> I recently got a report about a bit similar issue on a 4.4 stable kernel, so this
>> might not be xhci-cdns specific.
>>
>> That case involved autosuspend of the cdc-acm, and there was only a short burst of
>> transaction erros and resubmitted URBs even if the device was supposed to be suspended.
>> It looks like cdc_acm autosuspended even if it had URBs pending.
> 
> That must not happen. Do you have details?

Shared what I got in:
https://drive.google.com/open?id=1PjwmIK97bIfugWL697lJCe65yuDVOZhx

cdc-acm is unfortunately not traced, but usb core and xhci is.
grep for "MATTU" to find the relevant places. 

> 
>> I'm guessing that in that case the transfer ring restarted even if link was already "suspeded",
>> causing transaction errors. Ring could be restarted if URBs were resubmitted
>> by the class driver when usb core suspends all interfaces, flushing all pending URBs which
>> calls the URB completion handler.
> 
> How should a class driver do that? It gets -EPROTO and that's it,
> 
> 	Regards
> 		Oliver
> 

This is just my speculation, I haven't checked details yet.

It's not just the class driver that causes this, but a combination of
the following gaps in xhci, cdc-acm, and usb core.

A class driver autosuspending with URBs pending,
and not killing all URBs synchronously when usb core calls suspend for the interface drivers.

xhci restarting an endpoint ring due to a race between xhci_stop_device(), handling 
a Transaction error, and having pending URB (unhandled trb) on the ring. [1]

Sleeping between setting port to U3 link state, and clearing udev->can_submit.
allowing URBs to be submitted during that gap when link is in U3 already.
URBs will complete with -EPROTO, and resubmitted until udev->can_submit is cleared.

usb_runtime_suspend()      
  usb_suspend_both() // suspend a device and its interfaces
    for (each udev->actconfig->interface)
      usb_suspend_interface(udev, intf, msg);
	driver = to_usb_driver(intf->dev.driver);
	status = driver->suspend(intf, msg);  // URBs shuold be killed, sync, miss one??
    usb_suspend_device()
      generic_suspend()
        usb_port_suspend()
          hub_set_port_link_state(USB_SS_PORT_LS_U3)
            xhci_stop_device()
            xhci_set_link_state(USB_SS_PORT_LS_U3)  // port link is in U3
            msleep()       // during this URBs can be resubmitteded and complete with -EPROTO loop       
    udev->can_submit = 0;  // we can submit URBs until here (except the killed ones, they are flagged).
    for (each endpoint)
      usb_hcd_flush_endpoint(udev, udev->ep_out[i]);
      usb_hcd_flush_endpoint(udev, udev->ep_in[i]);

[1]
xhci_stop_device()
  xhci_queue_stop_endpoint()
-> interrupt (transfer event, ring has not stopped yet)
  handle_tx_event()  // bulk transfer with Transaction error
    process_bulk_intr_td()
      finish_td()
	xhci_cleanup_halted_endpoint()
          xhci_queue_reset_ep()
          xhci_queue_new_dequeue_state()
-> interupt, command completion event for stop endpoint, 
-> interrupt handle reset_ep command, xhci_handle_cmd_stop_ep()
-> interrupt handle new deq state
      xhci_handle_cmd_set_deq()
	ring_doorbell_for_active_rings(xhci, slot_id, ep_index)  - this restarts the ring.

-Mathias



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux