usb: dwc3: HC dies under high I/O load on Exynos5422

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I've discovered that on recent kernels the xHCI controller on Odroid
HC2 dies when a USB-attached disk is put under a heavy I/O load.

The hardware in question is using a DWC3 2.00a IP within the Exynos5422
to provide two internal USB3 ports. One of them is connected to a
JMS578 USB-to-SATA bridge (Odroid firmware v173.01.00.02). The bridge
is then connected to a Intel SSDSC2KG240G8 (firmware XCV10132).

The crash can be triggered by running a read-heavy workload. This
triggers it for me within tens of seconds:

$ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k \
 --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
 --time_based --group_reporting --name=iops-test-job \
 --eta-newline=1 --readonly

FIO output then follows this pattern:

iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.16
Starting 4 processes
Jobs: 4 (f=4): [R(4)][2.5%][r=341MiB/s][r=87.2k IOPS][eta 01m:57s]
Jobs: 4 (f=4): [R(4)][4.2%][r=340MiB/s][r=87.1k IOPS][eta 01m:55s]
Jobs: 4 (f=4): [R(4)][5.8%][r=337MiB/s][r=86.2k IOPS][eta 01m:53s]
Jobs: 4 (f=4): [R(4)][7.5%][r=369MiB/s][r=94.5k IOPS][eta 01m:51s]
Jobs: 4 (f=4): [R(4)][9.2%][r=364MiB/s][r=93.2k IOPS][eta 01m:49s]
Jobs: 4 (f=4): [R(4)][10.8%][r=363MiB/s][r=92.9k IOPS][eta 01m:47s]
Jobs: 4 (f=4): [R(4)][12.5%][r=348MiB/s][r=88.0k IOPS][eta 01m:45s]
Jobs: 4 (f=4): [R(4)][14.2%][r=348MiB/s][r=88.0k IOPS][eta 01m:43s]
Jobs: 4 (f=4): [R(4)][15.8%][r=377MiB/s][r=96.4k IOPS][eta 01m:41s]
Jobs: 4 (f=4): [R(4)][17.5%][r=372MiB/s][r=95.2k IOPS][eta 01m:39s]
Jobs: 4 (f=4): [R(4)][18.3%][r=77.0MiB/s][r=19.0k IOPS][eta 01m:38s]
Jobs: 4 (f=4): [R(4)][20.0%][eta 01m:36s]
< line without progress repeated many times; xHC is now unresponsive >
Jobs: 4 (f=4): [R(4)][45.8%][eta 01m:05s]
fio: io_u error on file /dev/sda: No such device: read
offset=1820839936, buflen=4096
fio: pid=1863, err=19/file:io_u.c:1787, func=io_u error, error=No such
device
< and so on >

Dmesg contains the following output:

[ 266.310767] xhci-hcd xhci-hcd.8.auto: xHCI host controller not
responding, assume dead
[ 266.317388] xhci-hcd xhci-hcd.8.auto: HC died; cleaning up
[ 266.322710] usb 4-1: cmd cmplt err -108
[ 266.326497] usb 4-1: cmd cmplt err -108
[ 266.330313] usb 4-1: cmd cmplt err -108
[ 266.334096] usb 4-1: cmd cmplt err -108
[ 266.337942] usb 4-1: cmd cmplt err -108
[ 266.341746] usb 4-1: cmd cmplt err -108
[ 266.345561] usb 4-1: cmd cmplt err -108
[ 266.349372] usb 4-1: cmd cmplt err -108
[ 266.353187] usb 4-1: cmd cmplt err -108
[ 266.357000] usb 4-1: cmd cmplt err -108
[ 266.360809] usb 4-1: cmd cmplt err -108
[ 266.364626] usb 4-1: cmd cmplt err -108
[ 266.368439] usb 4-1: cmd cmplt err -108
[ 266.372248] usb 4-1: cmd cmplt err -108
[ 266.376063] usb 4-1: cmd cmplt err -108
[ 266.379876] usb 4-1: cmd cmplt err -108
[ 266.383688] usb 4-1: cmd cmplt err -108
[ 266.387500] usb 4-1: cmd cmplt err -108
[ 266.391314] usb 4-1: cmd cmplt err -108
[ 266.395127] usb 4-1: cmd cmplt err -108
[ 266.398943] usb 4-1: cmd cmplt err -108
[ 266.402753] usb 4-1: cmd cmplt err -108
[ 266.406565] usb 4-1: cmd cmplt err -108
[ 266.410379] usb 4-1: cmd cmplt err -108
[ 266.414165] usb 4-1: cmd cmplt err -108
[ 266.418003] usb 4-1: cmd cmplt err -108
[ 266.448629] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 0, rd
1, flush 0, corrupt 0, gen 0
< more FS errors follow >

The OS is then unable to recover (I have rootfs on that SSD too) and
the board must be manually restarted.

I can reproduce the problem on mainline 6.4-rc6 with multi_v7_defconfig
(+ CONFIG_BTRFS=y for the rootfs). I've bisected the problem a while
back and the first broken commit is b138e23d3dff ("usb: dwc3: core:
Enable AutoRetry feature in the controller"). Reverting this commit
locally makes my board stable again (FIO test above can run
for >10 minutes without any issues).

The crash is happening when the USB-SATA bridge is controlled by the
uas driver. I have not tested the usb-storage driver yet.

What do you think would be an appropriate fix here? One idea I had is
to add a Odroid-specific quirk to DWC3 to not enable AutoRetry here.
However, I'm not entirely sure this is isolated to Odroid boards.

Please let me know if you need me to do some more experiments.

Thank you,

Jakub Vanek




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux