Hi Johan, On Mon, Aug 17, 2015 at 5:29 PM, Johan Hovold <johan@xxxxxxxxxx> wrote: > On Tue, Aug 11, 2015 at 12:31:47PM +0200, Yegor Yefremov wrote: >> On Tue, Aug 11, 2015 at 11:58 AM, Bjørn Mork <bjorn@xxxxxxx> wrote: >> > [replaced 'netdev' with 'linux-usb' as this concerns a USB serial driver only] >> > >> > Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes: >> > >> >> I have following problem. When removing USB dongle 07d1:3e01 or >> >> SierraWireless MC7304 I get following messages: >> >> >> >> option1 ttyUSB10: option_instat_callback: error -71 >> >> option1 ttyUSB9: option_instat_callback: error -71 >> >> option1 ttyUSB10: option_instat_callback: error -71 >> >> option1 ttyUSB9: option_instat_callback: error -71 >> >> option1 ttyUSB10: option_instat_callback: error -71 >> >> option1 ttyUSB9: option_instat_callback: error -71 >> >> INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, >> >> t=2102 jiffies, g=694, c=693, q=24) >> >> INFO: Stall ended before state dump start >> >> option1 ttyUSB10: option_instat_callback: error -71 >> >> >> >> drivers/usb/serial/option.c seems to make nothing with such a status >> >> and just prints error. How one would handle this properly and just >> >> unregister device? Do you need more info? >> >> >> >> Tested kernels: 3.18.20 and 4.2.0-rc5 (this kernel shows only RCU stall crash) >> >> Hardware: TI am335x >> > >> > >> > Isn't the device unregistered? What else can be done here? >> >> The problem is, that the system is dead (stall). It only prints >> "option1 ttyUSB10: option_instat_callback: error -71" endlessly >> (kernel 3.18.20) and console shows no reaction for input. And when you >> start watchdog from userspace the systems reboots after specified >> timeout (watchdog -t 5 -T 10 /dev/watchdog). > > Related issues have been reported with musb (BeagleBone Black) and dwc > controllers (RPi) when using an external hub. > > The exact reasons have not yet been fully determined, including whether > it is the hub driver that is prevented from processing the event or if > the disconnect event is never received at all. See for example: > > https://lkml.kernel.org/r/20150409171314.GA30002@localhost > > In the cases I've seen, changing the timing slightly was enough to make > the problem go away. Specifically, avoiding a large, unnecessary memcpy > or silencing an error message was sufficient. > > This obviously does not solve the underlying issue, but could you try > the below patch nonetheless? We don't want these error messages for the > "valid" use case of removing an in-use device anyway. > > Thanks, > Johan > > > From 81913b6a91bd53b4f00455618c141c35148acbfc Mon Sep 17 00:00:00 2001 > From: Johan Hovold <johan@xxxxxxxxxx> > Date: Mon, 17 Aug 2015 16:48:47 +0200 > Subject: [PATCH] USB: option: silence interrupt errors > > Avoid spamming the logs (e.g. with -EPROTO errors) when attempting to > resubmit the interrupt urb while a disconnect of an in-use device is > being processed. > > Signed-off-by: Johan Hovold <johan@xxxxxxxxxx> > --- > drivers/usb/serial/option.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c > index 983c911b3986..3992f7709b70 100644 > --- a/drivers/usb/serial/option.c > +++ b/drivers/usb/serial/option.c > @@ -1942,7 +1942,7 @@ static void option_instat_callback(struct urb *urb) > } else if (status == -ENOENT || status == -ESHUTDOWN) { > dev_dbg(dev, "%s: urb stopped: %d\n", __func__, status); > } else > - dev_err(dev, "%s: error %d\n", __func__, status); > + dev_dbg(dev, "%s: error %d\n", __func__, status); > > /* Resubmit urb so we continue receiving IRQ data */ > if (status != -ESHUTDOWN && status != -ENOENT) { > -- > 2.4.6 sorry for delay. Had no access to the hardware for some time. First of all error messages with timing info: [ 37.106216] option1 ttyUSB10: option_instat_callback: error -71 [ 37.112555] option1 ttyUSB9: option_instat_callback: error -71 [ 37.118843] option1 ttyUSB10: option_instat_callback: error -71 [ 37.125182] option1 ttyUSB9: option_instat_callback: error -71 [ 37.131507] option1 ttyUSB10: option_instat_callback: error -71 [ 37.137893] option1 ttyUSB9: option_instat_callback: error -71 [ 37.144141] option1 ttyUSB10: option_instat_callback: error -71 [ 37.150521] option1 ttyUSB9: option_instat_callback: error -71 [ 37.156810] option1 ttyUSB10: option_instat_callback: error -71 Applying your patch doesn't help, i.e. error messages disappear, but I get following crash anyway (kernel 3.18.20): [ 71.855731] INFO: rcu_sched self-detected stall on CPU { 0} (t=2100 jiffies g=951 c=950 q=7) [ 71.864822] Task dump for CPU 0: [ 71.868225] onrisctool R running 0 876 837 0x00000002 [ 71.874981] [<c0015b0c>] (unwind_backtrace) from [<c001229c>] (show_stack+0x10/0x14) [ 71.883138] [<c001229c>] (show_stack) from [<c0090944>] (rcu_dump_cpu_stacks+0x84/0xc8) [ 71.891563] [<c0090944>] (rcu_dump_cpu_stacks) from [<c0093ddc>] (rcu_check_callbacks+0x424/0x720) [ 71.900996] [<c0093ddc>] (rcu_check_callbacks) from [<c00986c8>] (update_process_times+0x40/0x60) [ 71.910333] [<c00986c8>] (update_process_times) from [<c00a8a90>] (tick_sched_handle.isra.17+0x44/0x50) [ 71.920216] [<c00a8a90>] (tick_sched_handle.isra.17) from [<c00a8af8>] (tick_sched_timer+0x5c/0xa8) [ 71.929734] [<c00a8af8>] (tick_sched_timer) from [<c0098ec4>] (__run_hrtimer+0x84/0x1f8) [ 71.938250] [<c0098ec4>] (__run_hrtimer) from [<c00999f8>] (hrtimer_interrupt+0x130/0x2dc) [ 71.946951] [<c00999f8>] (hrtimer_interrupt) from [<c0026ad8>] (omap2_gp_timer_interrupt+0x24/0x34) [ 71.956471] [<c0026ad8>] (omap2_gp_timer_interrupt) from [<c0089854>] (handle_irq_event_percpu+0x3c/0x1d8) [ 71.966627] [<c0089854>] (handle_irq_event_percpu) from [<c0089a2c>] (handle_irq_event+0x3c/0x5c) [ 71.975964] [<c0089a2c>] (handle_irq_event) from [<c008c5fc>] (handle_level_irq+0xb4/0x144) [ 71.984751] [<c008c5fc>] (handle_level_irq) from [<c0088f2c>] (generic_handle_irq+0x28/0x3c) [ 71.993628] [<c0088f2c>] (generic_handle_irq) from [<c0089210>] (__handle_domain_irq+0x64/0xc8) [ 72.002780] [<c0089210>] (__handle_domain_irq) from [<c0008780>] (omap_intc_handle_irq+0xb4/0xc4) [ 72.012125] [<c0008780>] (omap_intc_handle_irq) from [<c05d5ea4>] (__irq_svc+0x44/0x5c) [ 72.020540] Exception stack(0xcf7b9d68 to 0xcf7b9db0) [ 72.025858] 9d60: 00000001 cf7e1960 00000000 cf7e1400 c08b2080 c0043eac [ 72.034462] 9d80: 00000202 c094b56c cf7b8000 00000001 cf005000 00000000 00000001 cf7b9db0 [ 72.043061] 9da0: c007cfe4 c004395c 60000113 ffffffff [ 72.048389] [<c05d5ea4>] (__irq_svc) from [<c004395c>] (__do_softirq+0xc8/0x330) [ 72.056173] [<c004395c>] (__do_softirq) from [<c0043eac>] (irq_exit+0xb0/0x104) [ 72.063865] [<c0043eac>] (irq_exit) from [<c008921c>] (__handle_domain_irq+0x70/0xc8) [ 72.072104] [<c008921c>] (__handle_domain_irq) from [<c0008780>] (omap_intc_handle_irq+0xb4/0xc4) [ 72.081439] [<c0008780>] (omap_intc_handle_irq) from [<c05d5ea4>] (__irq_svc+0x44/0x5c) [ 72.089853] Exception stack(0xcf7b9e40 to 0xcf7b9e88) [ 72.095171] 9e40: 00000001 cf7e1960 00000000 cf7e1400 cd5ba4bc cf7d46c0 cd5ba4ec cd5ba4ac [ 72.103774] 9e60: cf7b9ed8 00000000 cf5ae200 00000000 00000001 cf7b9e88 c007cfe4 c05d54e8 [ 72.112371] 9e80: 20000013 ffffffff [ 72.116051] [<c05d5ea4>] (__irq_svc) from [<c05d54e8>] (_raw_spin_unlock_irq+0x28/0x2c) [ 72.124486] [<c05d54e8>] (_raw_spin_unlock_irq) from [<c010f7e0>] (unlink_file_vma+0x38/0x44) [ 72.133458] [<c010f7e0>] (unlink_file_vma) from [<c010a350>] (free_pgtables+0x80/0xcc) [ 72.141792] [<c010a350>] (free_pgtables) from [<c0111ff4>] (exit_mmap+0xec/0x1f4) [ 72.149667] [<c0111ff4>] (exit_mmap) from [<c003da94>] (mmput+0x44/0xec) [ 72.156719] [<c003da94>] (mmput) from [<c0041784>] (do_exit+0x2a4/0x988) [ 72.163772] [<c0041784>] (do_exit) from [<c0042d90>] (do_group_exit+0x4c/0xb8) [ 72.171373] [<c0042d90>] (do_group_exit) from [<c0042e0c>] (__wake_up_parent+0x0/0x18) And here 4.2-rc5 crash output: [ 70.076862] INFO: rcu_sched self-detected stall on CPU [ 70.082303] 0: (2600 ticks this GP) idle=90d/140000000000002/0 softirq=6058/6058 fqs=0 [ 70.090809] (t=2600 jiffies g=1441 c=1440 q=32) [ 70.095772] rcu_sched kthread starved for 2600 jiffies! g1441 c1440 f0x0 [ 70.102890] Task dump for CPU 0: [ 70.106287] onrisctool R running 0 888 854 0x00000002 [ 70.113033] [<c0016ecc>] (unwind_backtrace) from [<c00133d0>] (show_stack+0x10/0x14) [ 70.121191] [<c00133d0>] (show_stack) from [<c009c858>] (rcu_dump_cpu_stacks+0x84/0xc8) [ 70.129617] [<c009c858>] (rcu_dump_cpu_stacks) from [<c00a018c>] (rcu_check_callbacks+0x570/0x8f4) [ 70.139052] [<c00a018c>] (rcu_check_callbacks) from [<c00a4f8c>] (update_process_times+0x38/0x64) [ 70.148396] [<c00a4f8c>] (update_process_times) from [<c00b60cc>] (tick_sched_handle.isra.17+0x44/0x50) [ 70.158280] [<c00b60cc>] (tick_sched_handle.isra.17) from [<c00b6134>] (tick_sched_timer+0x5c/0xa8) [ 70.167797] [<c00b6134>] (tick_sched_timer) from [<c00a5830>] (__hrtimer_run_queues+0xec/0x2dc) [ 70.176949] [<c00a5830>] (__hrtimer_run_queues) from [<c00a626c>] (hrtimer_interrupt+0xac/0x200) [ 70.186199] [<c00a626c>] (hrtimer_interrupt) from [<c0026fc4>] (omap2_gp_timer_interrupt+0x24/0x34) [ 70.195722] [<c0026fc4>] (omap2_gp_timer_interrupt) from [<c0093dd4>] (handle_irq_event_percpu+0x40/0x1e8) [ 70.205878] [<c0093dd4>] (handle_irq_event_percpu) from [<c0093fbc>] (handle_irq_event+0x40/0x64) [ 70.215215] [<c0093fbc>] (handle_irq_event) from [<c0096d54>] (handle_level_irq+0xb8/0x150) [ 70.224004] [<c0096d54>] (handle_level_irq) from [<c0093470>] (generic_handle_irq+0x28/0x3c) [ 70.232882] [<c0093470>] (generic_handle_irq) from [<c0093758>] (__handle_domain_irq+0x64/0xe0) [ 70.242035] [<c0093758>] (__handle_domain_irq) from [<c00094ac>] (omap_intc_handle_irq+0xa8/0xb8) [ 70.251374] [<c00094ac>] (omap_intc_handle_irq) from [<c066e264>] (__irq_svc+0x44/0x5c) [ 70.259790] Exception stack(0xcf58dcf8 to 0xcf58dd40) [ 70.265105] dce0: 00000001 00000000 [ 70.273708] dd00: 00000000 cf4900c0 cf506640 a0000113 00000000 cf69c5a8 cf69c588 c095d6f4 [ 70.282311] dd20: cf58c000 00000000 00000000 cf58dd40 c00868c0 c03f73d8 60000113 ffffffff [ 70.290927] [<c066e264>] (__irq_svc) from [<c03f73d8>] (__usb_hcd_giveback_urb+0xc4/0x120) [ 70.299625] [<c03f73d8>] (__usb_hcd_giveback_urb) from [<c03f7fe8>] (usb_giveback_urb_bh+0x84/0xc0) [ 70.309150] [<c03f7fe8>] (usb_giveback_urb_bh) from [<c0044378>] (tasklet_action+0x88/0x128) [ 70.318029] [<c0044378>] (tasklet_action) from [<c004372c>] (__do_softirq+0xcc/0x34c) [ 70.326268] [<c004372c>] (__do_softirq) from [<c0043ccc>] (irq_exit+0xbc/0x130) [ 70.333960] [<c0043ccc>] (irq_exit) from [<c0093760>] (__handle_domain_irq+0x6c/0xe0) [ 70.342199] [<c0093760>] (__handle_domain_irq) from [<c00094ac>] (omap_intc_handle_irq+0xa8/0xb8) [ 70.351532] [<c00094ac>] (omap_intc_handle_irq) from [<c066e264>] (__irq_svc+0x44/0x5c) [ 70.359946] Exception stack(0xcf58de18 to 0xcf58de60) [ 70.365259] de00: 00000001 00000001 [ 70.373861] de20: 00000000 cf4900c0 c095c93c cf000f40 cf5ad030 c012ad20 a0000013 0f346000 [ 70.382463] de40: cfca293c cf43c274 00000000 cf58de60 c0086944 c0137734 20000013 ffffffff [ 70.391077] [<c066e264>] (__irq_svc) from [<c0137734>] (kmem_cache_free+0xa4/0x1d4) [ 70.399145] [<c0137734>] (kmem_cache_free) from [<c012ad20>] (__put_anon_vma+0x28/0xc0) [ 70.407569] [<c012ad20>] (__put_anon_vma) from [<c012b0ec>] (unlink_anon_vmas+0x1b4/0x22c) [ 70.416267] [<c012b0ec>] (unlink_anon_vmas) from [<c0120d84>] (free_pgtables+0x78/0xcc) [ 70.424689] [<c0120d84>] (free_pgtables) from [<c0128640>] (exit_mmap+0xec/0x1d8) [ 70.432564] [<c0128640>] (exit_mmap) from [<c003d01c>] (mmput+0x44/0xec) [ 70.439615] [<c003d01c>] (mmput) from [<c0041004>] (do_exit+0x2b0/0x9a0) [ 70.446667] [<c0041004>] (do_exit) from [<c00429f4>] (do_group_exit+0x4c/0xc4) [ 70.454268] [<c00429f4>] (do_group_exit) from [<c0042a7c>] (__wake_up_parent+0x0/0x18) Yegor -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html