Jean Rene Dawin wrote on Thu 28/05/20 10:51: > Oliver Neukum wrote on Wed 27/05/20 10:53: > > OK, we have two possibilities here. Either > > a4e7279cd1d19f48f0af2a10ed020febaa9ac092 or > > 0afccd7601514c4b83d8cc58c740089cc447051d > > Then I tested a4e7279cd1d19f48f0af2a10ed020febaa9ac092 with your patch > applied and it still showed the symptom Hi, more testing shows the crash can be triggered by - romving the battery the first time (but only sometimes) - re-insertiing battery and turning on the phone (after some interval) The trace when crashing looks like this: [ 122.890637] Call Trace: [ 122.890640] <IRQ> [ 122.890645] queue_work_on+0x36/0x40 [ 122.890650] __usb_hcd_giveback_urb+0x6f/0x120 [ 122.890653] usb_giveback_urb_bh+0xa6/0x100 [ 122.890657] tasklet_action_common.isra.0+0x5f/0x130 [ 122.890661] __do_softirq+0x111/0x34d [ 122.890665] irq_exit+0xac/0xd0 [ 122.890667] do_IRQ+0x89/0x140 [ 122.890670] common_interrupt+0xf/0xf [ 122.890672] </IRQ> Doing a function_graph ftrace on usb_giveback_urb_bh shows a difference between working and crashshing behaviour: Working: # remove battery 2802.875749 | 3) 0.331 us | usb_anchor_suspend_wakeups(); 2802.875749 | 3) 0.283 us | usb_unanchor_urb(); 2802.875750 | 3) 0.283 us | hub_irq(); # insert battery 2818.992447 | 3) 0.265 us | usb_anchor_suspend_wakeups(); 2818.992448 | 3) 0.279 us | usb_unanchor_urb(); 2818.992448 | 3) 0.277 us | hub_irq(); # turn on phone 2829.835833 | 3) 0.262 us | usb_anchor_suspend_wakeups(); 2829.835834 | 3) 0.273 us | usb_unanchor_urb(); 2829.835834 | 3) 0.294 us | hub_irq(); Crashing: # from dmesg [ 1537.742750] WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x38a/0x430 # remove battery / turn on phone 1536.448472 | 3) 0.373 us | usb_anchor_suspend_wakeups(); 1536.448473 | 3) 0.280 us | usb_unanchor_urb(); 1536.448473 | 3) | acm_read_bulk_callback [cdc_acm]() { 1536.448474 | 3) 0.306 us | ktime_get_mono_fast_ns(); [...] 1536.748347 | 3) 0.279 us | usb_anchor_suspend_wakeups(); 1536.748348 | 3) 0.289 us | usb_unanchor_urb(); 1536.748348 | 3) | acm_write_bulk [cdc_acm]() { 1536.748349 | 3) | _raw_spin_lock_irqsave() { [...] 1537.749348 | 3) 0.292 us | usb_anchor_suspend_wakeups(); 1537.749348 | 3) 0.298 us | usb_unanchor_urb(); 1537.749349 | 3) | acm_write_bulk [cdc_acm]() { 1537.749349 | 3) | _raw_spin_lock_irqsave() { [...] 1537.749370 | 3) | queue_work_on() { 1537.749370 | 3) | __queue_work() { 1537.749370 | 3) 0.273 us | __rcu_read_lock(); 1537.749371 | 3) 0.451 us | get_work_pool(); 1537.749372 | 3) | _raw_spin_lock() { 1537.749372 | 3) 0.270 us | preempt_count_add(); 1537.749373 | 3) 0.836 us | } 1537.749373 | 3) | do_invalid_op() { 1537.749374 | 3) 0.364 us | uprobe_get_trap_addr(); 1537.749374 | 3) | do_error_trap() { 1537.749375 | 3) | is_valid_bugaddr() { 1537.749375 | 3) | __probe_kernel_read() { 1537.749376 | 3) | __check_object_size() { 1537.749376 | 3) 0.292 us | check_stack_object(); 1537.749377 | 3) 0.397 us | __virt_addr_valid(); To me it looks like the problem arises when urb->complete(urb) is called in static void __usb_hcd_giveback_urb(struct urb *urb) from drivers/usb/core/hcd.c: 1641 usb_anchor_suspend_wakeups(anchor); 1642 usb_unanchor_urb(urb); 1643 if (likely(status == 0)) 1644 usb_led_activity(USB_LED_EVENT_HOST); 1645 1646 /* pass ownership to the completion handler */ 1647 urb->status = status; 1648 urb->complete(urb); If the "wrong" function is set in urb->complete I see the crash. In the normal case hub_irq() seems to be set. In the crashing case something like acm_write_bulk. May this be the cause? Regards, Jean Rene