Re: option driver crashes on modem removal

Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> · Mon, 24 Aug 2015 11:14:13 +0200

Hi Johan,

On Mon, Aug 17, 2015 at 5:29 PM, Johan Hovold <johan@xxxxxxxxxx> wrote:
> On Tue, Aug 11, 2015 at 12:31:47PM +0200, Yegor Yefremov wrote:
>> On Tue, Aug 11, 2015 at 11:58 AM, Bjørn Mork <bjorn@xxxxxxx> wrote:
>> > [replaced 'netdev' with 'linux-usb' as this concerns a USB serial driver only]
>> >
>> > Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes:
>> >
>> >> I have following problem. When removing USB dongle 07d1:3e01 or
>> >> SierraWireless MC7304 I get following messages:
>> >>
>> >> option1 ttyUSB10: option_instat_callback: error -71
>> >> option1 ttyUSB9: option_instat_callback: error -71
>> >> option1 ttyUSB10: option_instat_callback: error -71
>> >> option1 ttyUSB9: option_instat_callback: error -71
>> >> option1 ttyUSB10: option_instat_callback: error -71
>> >> option1 ttyUSB9: option_instat_callback: error -71
>> >> INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0,
>> >> t=2102 jiffies, g=694, c=693, q=24)
>> >> INFO: Stall ended before state dump start
>> >> option1 ttyUSB10: option_instat_callback: error -71
>> >>
>> >> drivers/usb/serial/option.c seems to make nothing with such a status
>> >> and just prints error. How one would handle this properly and just
>> >> unregister device? Do you need more info?
>> >>
>> >> Tested kernels: 3.18.20 and 4.2.0-rc5 (this kernel shows only RCU stall crash)
>> >> Hardware: TI am335x
>> >
>> >
>> > Isn't the device unregistered?  What else can be done here?
>>
>> The problem is, that the system is dead (stall). It only prints
>> "option1 ttyUSB10: option_instat_callback: error -71" endlessly
>> (kernel 3.18.20) and console shows no reaction for input. And when you
>> start watchdog from userspace the systems reboots after specified
>> timeout (watchdog -t 5 -T 10 /dev/watchdog).
>
> Related issues have been reported with musb (BeagleBone Black) and dwc
> controllers (RPi) when using an external hub.
>
> The exact reasons have not yet been fully determined, including whether
> it is the hub driver that is prevented from processing the event or if
> the disconnect event is never received at all. See for example:
>
>         https://lkml.kernel.org/r/20150409171314.GA30002@localhost
>
> In the cases I've seen, changing the timing slightly was enough to make
> the problem go away. Specifically, avoiding a large, unnecessary memcpy
> or silencing an error message was sufficient.
>
> This obviously does not solve the underlying issue, but could you try
> the below patch nonetheless? We don't want these error messages for the
> "valid" use case of removing an in-use device anyway.
>
> Thanks,
> Johan
>
>
> From 81913b6a91bd53b4f00455618c141c35148acbfc Mon Sep 17 00:00:00 2001
> From: Johan Hovold <johan@xxxxxxxxxx>
> Date: Mon, 17 Aug 2015 16:48:47 +0200
> Subject: [PATCH] USB: option: silence interrupt errors
>
> Avoid spamming the logs (e.g. with -EPROTO errors) when attempting to
> resubmit the interrupt urb while a disconnect of an in-use device is
> being processed.
>
> Signed-off-by: Johan Hovold <johan@xxxxxxxxxx>
> ---
>  drivers/usb/serial/option.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
> index 983c911b3986..3992f7709b70 100644
> --- a/drivers/usb/serial/option.c
> +++ b/drivers/usb/serial/option.c
> @@ -1942,7 +1942,7 @@ static void option_instat_callback(struct urb *urb)
>         } else if (status == -ENOENT || status == -ESHUTDOWN) {
>                 dev_dbg(dev, "%s: urb stopped: %d\n", __func__, status);
>         } else
> -               dev_err(dev, "%s: error %d\n", __func__, status);
> +               dev_dbg(dev, "%s: error %d\n", __func__, status);
>
>         /* Resubmit urb so we continue receiving IRQ data */
>         if (status != -ESHUTDOWN && status != -ENOENT) {
> --
> 2.4.6

sorry for delay. Had no access to the hardware for some time. First of
all error messages with timing info:

[   37.106216] option1 ttyUSB10: option_instat_callback: error -71
[   37.112555] option1 ttyUSB9: option_instat_callback: error -71
[   37.118843] option1 ttyUSB10: option_instat_callback: error -71
[   37.125182] option1 ttyUSB9: option_instat_callback: error -71
[   37.131507] option1 ttyUSB10: option_instat_callback: error -71
[   37.137893] option1 ttyUSB9: option_instat_callback: error -71
[   37.144141] option1 ttyUSB10: option_instat_callback: error -71
[   37.150521] option1 ttyUSB9: option_instat_callback: error -71
[   37.156810] option1 ttyUSB10: option_instat_callback: error -71

Applying your patch doesn't help, i.e. error messages disappear, but I
get following crash anyway (kernel 3.18.20):

[   71.855731] INFO: rcu_sched self-detected stall on CPU { 0}
(t=2100 jiffies g=951 c=950 q=7)
[   71.864822] Task dump for CPU 0:
[   71.868225] onrisctool      R running      0   876    837 0x00000002
[   71.874981] [<c0015b0c>] (unwind_backtrace) from [<c001229c>]
(show_stack+0x10/0x14)
[   71.883138] [<c001229c>] (show_stack) from [<c0090944>]
(rcu_dump_cpu_stacks+0x84/0xc8)
[   71.891563] [<c0090944>] (rcu_dump_cpu_stacks) from [<c0093ddc>]
(rcu_check_callbacks+0x424/0x720)
[   71.900996] [<c0093ddc>] (rcu_check_callbacks) from [<c00986c8>]
(update_process_times+0x40/0x60)
[   71.910333] [<c00986c8>] (update_process_times) from [<c00a8a90>]
(tick_sched_handle.isra.17+0x44/0x50)
[   71.920216] [<c00a8a90>] (tick_sched_handle.isra.17) from
[<c00a8af8>] (tick_sched_timer+0x5c/0xa8)
[   71.929734] [<c00a8af8>] (tick_sched_timer) from [<c0098ec4>]
(__run_hrtimer+0x84/0x1f8)
[   71.938250] [<c0098ec4>] (__run_hrtimer) from [<c00999f8>]
(hrtimer_interrupt+0x130/0x2dc)
[   71.946951] [<c00999f8>] (hrtimer_interrupt) from [<c0026ad8>]
(omap2_gp_timer_interrupt+0x24/0x34)
[   71.956471] [<c0026ad8>] (omap2_gp_timer_interrupt) from
[<c0089854>] (handle_irq_event_percpu+0x3c/0x1d8)
[   71.966627] [<c0089854>] (handle_irq_event_percpu) from
[<c0089a2c>] (handle_irq_event+0x3c/0x5c)
[   71.975964] [<c0089a2c>] (handle_irq_event) from [<c008c5fc>]
(handle_level_irq+0xb4/0x144)
[   71.984751] [<c008c5fc>] (handle_level_irq) from [<c0088f2c>]
(generic_handle_irq+0x28/0x3c)
[   71.993628] [<c0088f2c>] (generic_handle_irq) from [<c0089210>]
(__handle_domain_irq+0x64/0xc8)
[   72.002780] [<c0089210>] (__handle_domain_irq) from [<c0008780>]
(omap_intc_handle_irq+0xb4/0xc4)
[   72.012125] [<c0008780>] (omap_intc_handle_irq) from [<c05d5ea4>]
(__irq_svc+0x44/0x5c)
[   72.020540] Exception stack(0xcf7b9d68 to 0xcf7b9db0)
[   72.025858] 9d60:                   00000001 cf7e1960 00000000
cf7e1400 c08b2080 c0043eac
[   72.034462] 9d80: 00000202 c094b56c cf7b8000 00000001 cf005000
00000000 00000001 cf7b9db0
[   72.043061] 9da0: c007cfe4 c004395c 60000113 ffffffff
[   72.048389] [<c05d5ea4>] (__irq_svc) from [<c004395c>]
(__do_softirq+0xc8/0x330)
[   72.056173] [<c004395c>] (__do_softirq) from [<c0043eac>]
(irq_exit+0xb0/0x104)
[   72.063865] [<c0043eac>] (irq_exit) from [<c008921c>]
(__handle_domain_irq+0x70/0xc8)
[   72.072104] [<c008921c>] (__handle_domain_irq) from [<c0008780>]
(omap_intc_handle_irq+0xb4/0xc4)
[   72.081439] [<c0008780>] (omap_intc_handle_irq) from [<c05d5ea4>]
(__irq_svc+0x44/0x5c)
[   72.089853] Exception stack(0xcf7b9e40 to 0xcf7b9e88)
[   72.095171] 9e40: 00000001 cf7e1960 00000000 cf7e1400 cd5ba4bc
cf7d46c0 cd5ba4ec cd5ba4ac
[   72.103774] 9e60: cf7b9ed8 00000000 cf5ae200 00000000 00000001
cf7b9e88 c007cfe4 c05d54e8
[   72.112371] 9e80: 20000013 ffffffff
[   72.116051] [<c05d5ea4>] (__irq_svc) from [<c05d54e8>]
(_raw_spin_unlock_irq+0x28/0x2c)
[   72.124486] [<c05d54e8>] (_raw_spin_unlock_irq) from [<c010f7e0>]
(unlink_file_vma+0x38/0x44)
[   72.133458] [<c010f7e0>] (unlink_file_vma) from [<c010a350>]
(free_pgtables+0x80/0xcc)
[   72.141792] [<c010a350>] (free_pgtables) from [<c0111ff4>]
(exit_mmap+0xec/0x1f4)
[   72.149667] [<c0111ff4>] (exit_mmap) from [<c003da94>] (mmput+0x44/0xec)
[   72.156719] [<c003da94>] (mmput) from [<c0041784>] (do_exit+0x2a4/0x988)
[   72.163772] [<c0041784>] (do_exit) from [<c0042d90>]
(do_group_exit+0x4c/0xb8)
[   72.171373] [<c0042d90>] (do_group_exit) from [<c0042e0c>]
(__wake_up_parent+0x0/0x18)

And here 4.2-rc5 crash output:

[   70.076862] INFO: rcu_sched self-detected stall on CPU
[   70.082303]  0: (2600 ticks this GP) idle=90d/140000000000002/0
softirq=6058/6058 fqs=0
[   70.090809]   (t=2600 jiffies g=1441 c=1440 q=32)
[   70.095772] rcu_sched kthread starved for 2600 jiffies! g1441 c1440 f0x0
[   70.102890] Task dump for CPU 0:
[   70.106287] onrisctool      R running      0   888    854 0x00000002
[   70.113033] [<c0016ecc>] (unwind_backtrace) from [<c00133d0>]
(show_stack+0x10/0x14)
[   70.121191] [<c00133d0>] (show_stack) from [<c009c858>]
(rcu_dump_cpu_stacks+0x84/0xc8)
[   70.129617] [<c009c858>] (rcu_dump_cpu_stacks) from [<c00a018c>]
(rcu_check_callbacks+0x570/0x8f4)
[   70.139052] [<c00a018c>] (rcu_check_callbacks) from [<c00a4f8c>]
(update_process_times+0x38/0x64)
[   70.148396] [<c00a4f8c>] (update_process_times) from [<c00b60cc>]
(tick_sched_handle.isra.17+0x44/0x50)
[   70.158280] [<c00b60cc>] (tick_sched_handle.isra.17) from
[<c00b6134>] (tick_sched_timer+0x5c/0xa8)
[   70.167797] [<c00b6134>] (tick_sched_timer) from [<c00a5830>]
(__hrtimer_run_queues+0xec/0x2dc)
[   70.176949] [<c00a5830>] (__hrtimer_run_queues) from [<c00a626c>]
(hrtimer_interrupt+0xac/0x200)
[   70.186199] [<c00a626c>] (hrtimer_interrupt) from [<c0026fc4>]
(omap2_gp_timer_interrupt+0x24/0x34)
[   70.195722] [<c0026fc4>] (omap2_gp_timer_interrupt) from
[<c0093dd4>] (handle_irq_event_percpu+0x40/0x1e8)
[   70.205878] [<c0093dd4>] (handle_irq_event_percpu) from
[<c0093fbc>] (handle_irq_event+0x40/0x64)
[   70.215215] [<c0093fbc>] (handle_irq_event) from [<c0096d54>]
(handle_level_irq+0xb8/0x150)
[   70.224004] [<c0096d54>] (handle_level_irq) from [<c0093470>]
(generic_handle_irq+0x28/0x3c)
[   70.232882] [<c0093470>] (generic_handle_irq) from [<c0093758>]
(__handle_domain_irq+0x64/0xe0)
[   70.242035] [<c0093758>] (__handle_domain_irq) from [<c00094ac>]
(omap_intc_handle_irq+0xa8/0xb8)
[   70.251374] [<c00094ac>] (omap_intc_handle_irq) from [<c066e264>]
(__irq_svc+0x44/0x5c)
[   70.259790] Exception stack(0xcf58dcf8 to 0xcf58dd40)
[   70.265105] dce0:
    00000001 00000000
[   70.273708] dd00: 00000000 cf4900c0 cf506640 a0000113 00000000
cf69c5a8 cf69c588 c095d6f4
[   70.282311] dd20: cf58c000 00000000 00000000 cf58dd40 c00868c0
c03f73d8 60000113 ffffffff
[   70.290927] [<c066e264>] (__irq_svc) from [<c03f73d8>]
(__usb_hcd_giveback_urb+0xc4/0x120)
[   70.299625] [<c03f73d8>] (__usb_hcd_giveback_urb) from [<c03f7fe8>]
(usb_giveback_urb_bh+0x84/0xc0)
[   70.309150] [<c03f7fe8>] (usb_giveback_urb_bh) from [<c0044378>]
(tasklet_action+0x88/0x128)
[   70.318029] [<c0044378>] (tasklet_action) from [<c004372c>]
(__do_softirq+0xcc/0x34c)
[   70.326268] [<c004372c>] (__do_softirq) from [<c0043ccc>]
(irq_exit+0xbc/0x130)
[   70.333960] [<c0043ccc>] (irq_exit) from [<c0093760>]
(__handle_domain_irq+0x6c/0xe0)
[   70.342199] [<c0093760>] (__handle_domain_irq) from [<c00094ac>]
(omap_intc_handle_irq+0xa8/0xb8)
[   70.351532] [<c00094ac>] (omap_intc_handle_irq) from [<c066e264>]
(__irq_svc+0x44/0x5c)
[   70.359946] Exception stack(0xcf58de18 to 0xcf58de60)
[   70.365259] de00:
    00000001 00000001
[   70.373861] de20: 00000000 cf4900c0 c095c93c cf000f40 cf5ad030
c012ad20 a0000013 0f346000
[   70.382463] de40: cfca293c cf43c274 00000000 cf58de60 c0086944
c0137734 20000013 ffffffff
[   70.391077] [<c066e264>] (__irq_svc) from [<c0137734>]
(kmem_cache_free+0xa4/0x1d4)
[   70.399145] [<c0137734>] (kmem_cache_free) from [<c012ad20>]
(__put_anon_vma+0x28/0xc0)
[   70.407569] [<c012ad20>] (__put_anon_vma) from [<c012b0ec>]
(unlink_anon_vmas+0x1b4/0x22c)
[   70.416267] [<c012b0ec>] (unlink_anon_vmas) from [<c0120d84>]
(free_pgtables+0x78/0xcc)
[   70.424689] [<c0120d84>] (free_pgtables) from [<c0128640>]
(exit_mmap+0xec/0x1d8)
[   70.432564] [<c0128640>] (exit_mmap) from [<c003d01c>] (mmput+0x44/0xec)
[   70.439615] [<c003d01c>] (mmput) from [<c0041004>] (do_exit+0x2b0/0x9a0)
[   70.446667] [<c0041004>] (do_exit) from [<c00429f4>]
(do_group_exit+0x4c/0xc4)
[   70.454268] [<c00429f4>] (do_group_exit) from [<c0042a7c>]
(__wake_up_parent+0x0/0x18)

Yegor
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html