Re: option driver crashes on modem removal

Bjørn Mork <bjorn@xxxxxxx> · Tue, 11 Aug 2015 13:48:22 +0200

Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes:

> On Tue, Aug 11, 2015 at 11:58 AM, Bjørn Mork <bjorn@xxxxxxx> wrote:
>> [replaced 'netdev' with 'linux-usb' as this concerns a USB serial driver only]
>>
>> Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes:
>>
>>> I have following problem. When removing USB dongle 07d1:3e01 or
>>> SierraWireless MC7304 I get following messages:
>>>
>>> option1 ttyUSB10: option_instat_callback: error -71
>>> option1 ttyUSB9: option_instat_callback: error -71
>>> option1 ttyUSB10: option_instat_callback: error -71
>>> option1 ttyUSB9: option_instat_callback: error -71
>>> option1 ttyUSB10: option_instat_callback: error -71
>>> option1 ttyUSB9: option_instat_callback: error -71
>>> INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0,
>>> t=2102 jiffies, g=694, c=693, q=24)
>>> INFO: Stall ended before state dump start
>>> option1 ttyUSB10: option_instat_callback: error -71
>>>
>>> drivers/usb/serial/option.c seems to make nothing with such a status
>>> and just prints error. How one would handle this properly and just
>>> unregister device? Do you need more info?
>>>
>>> Tested kernels: 3.18.20 and 4.2.0-rc5 (this kernel shows only RCU stall crash)
>>> Hardware: TI am335x
>>
>>
>> Isn't the device unregistered?  What else can be done here?
>
> The problem is, that the system is dead (stall). It only prints
> "option1 ttyUSB10: option_instat_callback: error -71" endlessly
> (kernel 3.18.20) and console shows no reaction for input. And when you
> start watchdog from userspace the systems reboots after specified
> timeout (watchdog -t 5 -T 10 /dev/watchdog).

Ouch.  OK.  I don't understand exactly what's happening here,

I tried to reproduce the problem with debugging on and got different
results on my hardware.  Unplugging the modem with /dev/ttyUSB0 open:

Aug 11 13:33:29 nemi kernel: [388599.850164] usb 3-2: USB disconnect, device number 71
Aug 11 13:33:29 nemi kernel: [388599.852044] option_instat_callback: option1 ttyUSB0: option_instat_callback: urb ffff880017c5aa00 port ffff8801615cc000 has data ffff8800a5b37a00
Aug 11 13:33:29 nemi kernel: [388599.852052] option_instat_callback: option1 ttyUSB0: option_instat_callback: urb stopped: -108
Aug 11 13:33:29 nemi kernel: [388599.852612] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
Aug 11 13:33:29 nemi kernel: [388599.852632] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
Aug 11 13:33:29 nemi kernel: [388599.852643] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
Aug 11 13:33:29 nemi kernel: [388599.852653] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
Aug 11 13:33:29 nemi kernel: [388599.853334] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
Aug 11 13:33:29 nemi kernel: [388599.853366] option 3-2:1.0: device disconnected
Aug 11 13:33:29 nemi kernel: [388599.853909] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
Aug 11 13:33:29 nemi kernel: [388599.853958] option 3-2:1.1: device disconnected
Aug 11 13:33:29 nemi kernel: [388599.854453] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
Aug 11 13:33:29 nemi kernel: [388599.854491] option 3-2:1.2: device disconnected
Aug 11 13:33:29 nemi kernel: [388599.854832] qmi_wwan 3-2:1.3 wwan1: unregister 'qmi_wwan' usb-0000:00:1d.7-2, WWAN/QMI device

I wonder if this is related to different platforms using different
errors for this event?  As you can see, I get ESHUTDOWN where you got
EPROTO. The driver resubmits the URB in the EPROTO case. And that's
probably why you end up with a dead system.  Although I would have
thought that the submit should immediately return an error, the fact
that you get multiple error messages for the same device proves that the
resubmit results in the callback being executed.  I guess it ends up in
a tight resubmit loop.

I hope some of the USB experts can tell us what the correct behaviour is
here.  Should the driver treat EPROTO like ESHUTDOWN?  Or should the
host controller use some ESHUTDOWN instead?

If so, what about other errors?  If the assumptions above are correct,
then it seems that any unhandled persistent error can send the driver
into a hard loop.  That doesn't seem right...

Bjørn

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html