Bug 110531 - sysfs read_descriptors() can be blocked up to 30 seconds due to USB error recovery in hub_event()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Here is a summary of the kernel ticket that I created at https://bugzilla.kernel.org/show_bug.cgi?id=110531

Before kernel 3.13, sysfs read_descriptors() was non-blocking and in kernel 3.13 the following commit was added
"232275a USB: fix substandard locking for the sysfs files"
which added usb_lock_device(udev) which can cause blocking to read_descriptors().

Blocking occurs when sysfs read_descriptors' udev == hub_event's hdev value.

If USB comms have failed in hub_port_init() then the blocking can be up to 30 seconds due to the 6 retry attempts at re-establishing USB comms with a USB device.

This period of mutex blocking can be too long for applications as the application's thread for reading sysfs hangs until usb_unlock_device(hdev) is called in hub_event().

This issue was observed in our embedded ARM based commercial project that had an application that was reading the descriptors via sysfs to determine whether the hub and devices supported OTG mode. The system had a check to see whether the application got to a "normal" system state within 25 seconds of power-up.

In the failure scenario, a malfunctioning USB stick was connected (present at power-up) to the ARM system. The USB stick worked OK for 10 seconds but failed causing usb_events() (ARM was based on kernel 3.14) to cause hub_init_port() to attempt up to 6 times to try to re-establish USB communications, this can take up to 30 seconds to complete all attempts.

Whilst the connection attempts are being attempted, hub_events() holds a device lock referenced by hdev. However, sysfs read_descriptors() can be blocked on this held lock when udev == hdev of hub_events(). When read_descriptors() is blocked, this blocking can be up to 30 seconds.

In our project, a blocked read_descriptors() thread causes the sysfs read thread in the application to hang which prevents the application from reaching the "normal" state within the 25 second constraint set by the project and subsequently the management entity triggers a controlled system reboot. The board cyclically reset as the USB stick failed on each power-up. For the project, this behaviour is undesirable.

This issue is reproducible on x86 kernel 4.3.2 by adding test code.

Testcase:
=========

The tarball kernel_x86_4.3.2_test_patches.tar.gz contains the following 3 test patches (not a fix)

0001-USB-TEST-Debug-to-show-locking-in-read_descriptors.patch
0002-USB-TEST-spoof-hub_port_init-descriptors-timeout.patch
0003-USB-TEST-show-locking-of-hub_event.patch

The patches are for kernel 4.3.2 but should easily be ported to other kernels.

The testcase causes 5 of the 6 connection attempts in hub_port_init() to spoof a 5 second timeout as if the USB read command timed out. This represents a malfunctioning USB stick that was previously successfully enumerated and working but then failed. OK, the spoofed failure is not identical to the real-world failure but it demonstrates the blocking issue.

The testcase does not fail all 6 attempts because this would prevent USB devices from working on the test system. In the real-world failure, all 6 attempts fail.

The test consists of:

1. Connect a USB stick to usb2
2. Quickly run
hexdump /sys/bus/usb/devices/2-1/descriptors
3. It will be observed that hexdump hangs for up to 25 seconds
4. hexdump output is
0000000 0112 0200 0009 4001 8087 0020 0000 0000
0000010 0100 0209 0019 0101 e000 0900 0004 0100
0000020 0009 0000 0507 0381 0002 000c
000002b

The hexdump hangs because sysfs read_descriptors() is blocked by hub_event() due to the hub_init_port() connection attempts.

To run the test, adjustment to the hexdump command line will be necessary to select the right root hub (I think that is what 2-1 means)

Results:
========

Here is an example dmesg log output of a x86 kernel 4.3.2 patched with the kernel_x86_4.3.2_test_patches.tar.gz patches.

The output of the test code shows the blocking issue:

***************************
Inserted USB stick on 2-1.3
***************************
[ 1396.253093] hub 2-1:1.0: hub_event ffff88013292e800 get lock
[ 1396.253098] hub 2-1:1.0: hub_event ffff88013292e800 got lock
[ 1396.429922] usb 2-1.3: new high-speed USB device number 6 using ehci-pci
[ 1396.430186] usb 2-1.3: spoof descriptors timeout 0 0
********************************************
hexdump /sys/bus/usb/devices/2-1/descriptors
********************************************
[ 1397.916238] usb 2-1: read_descriptors ffff88013292e800 get lock
******************************************************************************************
"read_descriptors() for 2-1 gets blocked for 1421.715514 - 1397.916238 = 23.8 seconds !!!"
******************************************************************************************
[ 1401.434169] usb 2-1.3: spoof descriptors timeout 0 1
[ 1406.438131] usb 2-1.3: spoof descriptors timeout 0 2
[ 1411.514910] usb 2-1.3: device descriptor read/64, error -110
[ 1411.616330] usb 2-1.3: spoof descriptors timeout 1 0
[ 1416.620267] usb 2-1.3: spoof descriptors timeout 1 1
[ 1421.712665] usb 2-1.3: New USB device found, idVendor=0930, idProduct=6544 [ 1421.712669] usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1421.712671] usb 2-1.3: Product: TransMemory
[ 1421.712673] usb 2-1.3: Manufacturer: TOSHIBA
[ 1421.712674] usb 2-1.3: SerialNumber: 354C0394B317CD91CFF7D9C2
[ 1421.713039] usb-storage 2-1.3:1.0: USB Mass Storage device detected
[ 1421.713438] usb 2-1.3: read_descriptors ffff8800a58aa800 get lock
[ 1421.713592] scsi host8: usb-storage 2-1.3:1.0
[ 1421.713698] usb 2-1.3: read_descriptors ffff8800a58aa800 got lock
[ 1421.713703] usb 2-1.3: read_descriptors ffff8800a58aa800 released lock
[ 1421.713714] hub 2-1:1.0: hub_event ffff88013292e800 released lock
**************************************************************************************************************************
"hub_event() releases the hdev lock and allows read_descriptors to get the udev lock. Note hdev value matches udev value."
**************************************************************************************************************************
[ 1421.715514] usb 2-1: read_descriptors ffff88013292e800 got lock
[ 1421.715521] usb 2-1: read_descriptors ffff88013292e800 released lock
[ 1421.715580] usb 2-1: read_descriptors ffff88013292e800 get lock
[ 1421.715583] usb 2-1: read_descriptors ffff88013292e800 got lock
[ 1421.715586] usb 2-1: read_descriptors ffff88013292e800 released lock
[ 1421.715590] usb 2-1: read_descriptors ffff88013292e800 get lock
[ 1421.715592] usb 2-1: read_descriptors ffff88013292e800 got lock
[ 1421.715594] usb 2-1: read_descriptors ffff88013292e800 released lock
"hexdump output is
0000000 0112 0200 0009 4001 8087 0020 0000 0000
0000010 0100 0209 0019 0101 e000 0900 0004 0100
0000020 0009 0000 0507 0381 0002 000c
000002b"

Issue:
======

It can be seen that sysfs read_descriptors() is accessing the root hub's (I think) descriptor information but the USB communications issue is for the USB device connected to the root hub. In other words, the read_descriptors() blocking is not occurring on the device that is malfunctioning.

The root hub's descriptor information is hard-coded (I think) and therefore read_descriptors() does not need locking to read this information. This is because the sysfs vfs entry's lifetime is bracketed by the root hub being present.

In fact, I suspect the locking added by the kernel 3.13 commit for read_descriptors() is invalid because read_descriptors() performs no USB activity; read_descriptors() just reads information from an allocated memory structure. This structure is protected as the structure is existing before and after the sysfs vfs descriptors entry is created or destroyed. The information is only written at the time of enumeration and does not change. At least that is my understanding.

It is noted that in our testing of kernel 3.8 on ARM, that sysfs read_descriptors() was non-blocking because the kernel 3.13 comment was not there.

The pre-kernel 3.13 sysfs read_descriptors() seemed to work OK.

Proposal:
=========

Remove the usb_lock_device(udev) and usb_unlock_device(udev) from devices/usb/core/sysfs.c in read_descriptors() that was added by the kernel 3.13 commit
"232275a USB: fix substandard locking for the sysfs files"

Any comments to this proposal ?

Regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

Attachment: kernel_x86_4.3.2_test_patches.tar.gz
Description: application/gzip


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux