Re: option driver crashes on modem removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 Aug 2015, Bjørn Mork wrote:

> Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes:
> 
> > On Tue, Aug 11, 2015 at 11:58 AM, Bjørn Mork <bjorn@xxxxxxx> wrote:
> >> [replaced 'netdev' with 'linux-usb' as this concerns a USB serial driver only]
> >>
> >> Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> writes:
> >>
> >>> I have following problem. When removing USB dongle 07d1:3e01 or
> >>> SierraWireless MC7304 I get following messages:
> >>>
> >>> option1 ttyUSB10: option_instat_callback: error -71
> >>> option1 ttyUSB9: option_instat_callback: error -71
> >>> option1 ttyUSB10: option_instat_callback: error -71
> >>> option1 ttyUSB9: option_instat_callback: error -71
> >>> option1 ttyUSB10: option_instat_callback: error -71
> >>> option1 ttyUSB9: option_instat_callback: error -71
> >>> INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0,
> >>> t=2102 jiffies, g=694, c=693, q=24)
> >>> INFO: Stall ended before state dump start
> >>> option1 ttyUSB10: option_instat_callback: error -71
> >>>
> >>> drivers/usb/serial/option.c seems to make nothing with such a status
> >>> and just prints error. How one would handle this properly and just
> >>> unregister device? Do you need more info?
> >>>
> >>> Tested kernels: 3.18.20 and 4.2.0-rc5 (this kernel shows only RCU stall crash)
> >>> Hardware: TI am335x
> >>
> >>
> >> Isn't the device unregistered?  What else can be done here?
> >
> > The problem is, that the system is dead (stall). It only prints
> > "option1 ttyUSB10: option_instat_callback: error -71" endlessly
> > (kernel 3.18.20) and console shows no reaction for input. And when you
> > start watchdog from userspace the systems reboots after specified
> > timeout (watchdog -t 5 -T 10 /dev/watchdog).
> 
> Ouch.  OK.  I don't understand exactly what's happening here,

It would be nice if Yegor's log messages included timestamps.

At any rate, the problem is that the kernel never detected that the
device had been disconnected.  This is normally handled by the hub
driver (and also by the host controller driver, if the device was
plugged directly into the host controller rather than into an
intermediate hub).  It does require a process context to run, though,
so if interrupts are taking up all the available CPU time then it might
not happen.

> I tried to reproduce the problem with debugging on and got different
> results on my hardware.  Unplugging the modem with /dev/ttyUSB0 open:
> 
> Aug 11 13:33:29 nemi kernel: [388599.850164] usb 3-2: USB disconnect, device number 71

See, on your system the kernel did detect the disconnection.

> Aug 11 13:33:29 nemi kernel: [388599.852044] option_instat_callback: option1 ttyUSB0: option_instat_callback: urb ffff880017c5aa00 port ffff8801615cc000 has data ffff8800a5b37a00
> Aug 11 13:33:29 nemi kernel: [388599.852052] option_instat_callback: option1 ttyUSB0: option_instat_callback: urb stopped: -108
> Aug 11 13:33:29 nemi kernel: [388599.852612] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
> Aug 11 13:33:29 nemi kernel: [388599.852632] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
> Aug 11 13:33:29 nemi kernel: [388599.852643] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
> Aug 11 13:33:29 nemi kernel: [388599.852653] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
> Aug 11 13:33:29 nemi kernel: [388599.853334] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
> Aug 11 13:33:29 nemi kernel: [388599.853366] option 3-2:1.0: device disconnected
> Aug 11 13:33:29 nemi kernel: [388599.853909] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
> Aug 11 13:33:29 nemi kernel: [388599.853958] option 3-2:1.1: device disconnected
> Aug 11 13:33:29 nemi kernel: [388599.854453] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
> Aug 11 13:33:29 nemi kernel: [388599.854491] option 3-2:1.2: device disconnected
> Aug 11 13:33:29 nemi kernel: [388599.854832] qmi_wwan 3-2:1.3 wwan1: unregister 'qmi_wwan' usb-0000:00:1d.7-2, WWAN/QMI device
> 
> 
> I wonder if this is related to different platforms using different
> errors for this event?  As you can see, I get ESHUTDOWN where you got
> EPROTO. The driver resubmits the URB in the EPROTO case. And that's
> probably why you end up with a dead system.  Although I would have
> thought that the submit should immediately return an error, the fact
> that you get multiple error messages for the same device proves that the
> resubmit results in the callback being executed.  I guess it ends up in
> a tight resubmit loop.
> 
> I hope some of the USB experts can tell us what the correct behaviour is
> here.  Should the driver treat EPROTO like ESHUTDOWN?  Or should the
> host controller use some ESHUTDOWN instead?

EPROTO should not be treated like ESHUTDOWN.  The return code should 
not be ESHUTDOWN unless the kernel has detected that the device was 
unplugged.

Note: USB error codes are listed in Documentation/usb/error-codes.txt.

> If so, what about other errors?  If the assumptions above are correct,
> then it seems that any unhandled persistent error can send the driver
> into a hard loop.  That doesn't seem right...

There are a few possible errors which can arise when a device fails to 
respond (for instance, because it has been unplugged): EPROTO, EILSEQ, 
and ETIME.  Host controller drivers differ on which ones they return, 
but drivers shouldn't care about the details -- as far as a driver is 
concerned, they all mean the same thing: Device failed to respond.

Ideally drivers should slow down or pause their URB submissions if they 
get too many of these errors in quick succession, as Oliver suggested.  
In many situations it doesn't matter much, because sending an URB to a 
non-responsive or unplugged device won't return an error until at least 
a millisecond has elapsed.  But some host controllers might respond 
more quickly than this, which theoretically could lead to a tight 
resubmission loop.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux