Re: usb: dwc3: HC dies under high I/O load on Exynos5422

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay in response. I was away.

On Fri, Jun 16, 2023, Jakub Vaněk wrote:
> Hi all,
> 
> I've discovered that on recent kernels the xHCI controller on Odroid
> HC2 dies when a USB-attached disk is put under a heavy I/O load.
> 
> The hardware in question is using a DWC3 2.00a IP within the Exynos5422

Just want to clarify, this is dwc_usb3 v2.00a and not dwc_usb31.

> to provide two internal USB3 ports. One of them is connected to a
> JMS578 USB-to-SATA bridge (Odroid firmware v173.01.00.02). The bridge
> is then connected to a Intel SSDSC2KG240G8 (firmware XCV10132).
> 
> The crash can be triggered by running a read-heavy workload. This
> triggers it for me within tens of seconds:
> 
> $ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k \
>  --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
>  --time_based --group_reporting --name=iops-test-job \
>  --eta-newline=1 --readonly
> 
> FIO output then follows this pattern:
> 
> iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=libaio, iodepth=256
> ...
> fio-3.16
> Starting 4 processes
> Jobs: 4 (f=4): [R(4)][2.5%][r=341MiB/s][r=87.2k IOPS][eta 01m:57s]
> Jobs: 4 (f=4): [R(4)][4.2%][r=340MiB/s][r=87.1k IOPS][eta 01m:55s]
> Jobs: 4 (f=4): [R(4)][5.8%][r=337MiB/s][r=86.2k IOPS][eta 01m:53s]
> Jobs: 4 (f=4): [R(4)][7.5%][r=369MiB/s][r=94.5k IOPS][eta 01m:51s]
> Jobs: 4 (f=4): [R(4)][9.2%][r=364MiB/s][r=93.2k IOPS][eta 01m:49s]
> Jobs: 4 (f=4): [R(4)][10.8%][r=363MiB/s][r=92.9k IOPS][eta 01m:47s]
> Jobs: 4 (f=4): [R(4)][12.5%][r=348MiB/s][r=88.0k IOPS][eta 01m:45s]
> Jobs: 4 (f=4): [R(4)][14.2%][r=348MiB/s][r=88.0k IOPS][eta 01m:43s]
> Jobs: 4 (f=4): [R(4)][15.8%][r=377MiB/s][r=96.4k IOPS][eta 01m:41s]
> Jobs: 4 (f=4): [R(4)][17.5%][r=372MiB/s][r=95.2k IOPS][eta 01m:39s]
> Jobs: 4 (f=4): [R(4)][18.3%][r=77.0MiB/s][r=19.0k IOPS][eta 01m:38s]
> Jobs: 4 (f=4): [R(4)][20.0%][eta 01m:36s]
> < line without progress repeated many times; xHC is now unresponsive >
> Jobs: 4 (f=4): [R(4)][45.8%][eta 01m:05s]
> fio: io_u error on file /dev/sda: No such device: read
> offset=1820839936, buflen=4096
> fio: pid=1863, err=19/file:io_u.c:1787, func=io_u error, error=No such
> device
> < and so on >
> 
> Dmesg contains the following output:
> 
> [ 266.310767] xhci-hcd xhci-hcd.8.auto: xHCI host controller not
> responding, assume dead
> [ 266.317388] xhci-hcd xhci-hcd.8.auto: HC died; cleaning up
> [ 266.322710] usb 4-1: cmd cmplt err -108
> [ 266.326497] usb 4-1: cmd cmplt err -108
> [ 266.330313] usb 4-1: cmd cmplt err -108
> [ 266.334096] usb 4-1: cmd cmplt err -108
> [ 266.337942] usb 4-1: cmd cmplt err -108
> [ 266.341746] usb 4-1: cmd cmplt err -108
> [ 266.345561] usb 4-1: cmd cmplt err -108
> [ 266.349372] usb 4-1: cmd cmplt err -108
> [ 266.353187] usb 4-1: cmd cmplt err -108
> [ 266.357000] usb 4-1: cmd cmplt err -108
> [ 266.360809] usb 4-1: cmd cmplt err -108
> [ 266.364626] usb 4-1: cmd cmplt err -108
> [ 266.368439] usb 4-1: cmd cmplt err -108
> [ 266.372248] usb 4-1: cmd cmplt err -108
> [ 266.376063] usb 4-1: cmd cmplt err -108
> [ 266.379876] usb 4-1: cmd cmplt err -108
> [ 266.383688] usb 4-1: cmd cmplt err -108
> [ 266.387500] usb 4-1: cmd cmplt err -108
> [ 266.391314] usb 4-1: cmd cmplt err -108
> [ 266.395127] usb 4-1: cmd cmplt err -108
> [ 266.398943] usb 4-1: cmd cmplt err -108
> [ 266.402753] usb 4-1: cmd cmplt err -108
> [ 266.406565] usb 4-1: cmd cmplt err -108
> [ 266.410379] usb 4-1: cmd cmplt err -108
> [ 266.414165] usb 4-1: cmd cmplt err -108
> [ 266.418003] usb 4-1: cmd cmplt err -108
> [ 266.448629] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 0, rd
> 1, flush 0, corrupt 0, gen 0
> < more FS errors follow >
> 
> The OS is then unable to recover (I have rootfs on that SSD too) and
> the board must be manually restarted.
> 
> I can reproduce the problem on mainline 6.4-rc6 with multi_v7_defconfig
> (+ CONFIG_BTRFS=y for the rootfs). I've bisected the problem a while
> back and the first broken commit is b138e23d3dff ("usb: dwc3: core:
> Enable AutoRetry feature in the controller"). Reverting this commit
> locally makes my board stable again (FIO test above can run
> for >10 minutes without any issues).

This info helps a lot.

> 
> The crash is happening when the USB-SATA bridge is controlled by the
> uas driver. I have not tested the usb-storage driver yet.
> 
> What do you think would be an appropriate fix here? One idea I had is
> to add a Odroid-specific quirk to DWC3 to not enable AutoRetry here.
> However, I'm not entirely sure this is isolated to Odroid boards.
> 
> Please let me know if you need me to do some more experiments.
> 

This failure indicates that whichever device you're testing against
could not retry with burst (NumP != 0) after a CRC error. After a period
of time, the host timed out and attempted to restore its operations by
stoping the active transfers with a Stop-ep command. However, for some
reason, the host doesn't respond to this command. The crash you observed
is probably a separate issue. The main issue is why the host doesn't
receive a command completion event. If you're our direct customer, you
can submit a STAR request for our support. I'm not aware of this type of
failure related to AutoRetry. However, given how old this controller
version is (over a decade ago), I can't be sure.

I think if you try to test against a different device, you may not
observe this same failure.

To resolve this, please look into our support team to investigate
further to see whether it's a setup issue. Otherwise, we can disable
this feature for dwc_usb3 v2.00a. Depending on how bad the CRC error
rate is (which should be low), this should not affect performance much.
I don't think this neccessarily needs a new DT property.

Something like this:

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 0beaab932e7d..1bfd8b127240 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1209,8 +1209,9 @@ static int dwc3_core_init(struct dwc3 *dwc)
 		dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
 	}
 
-	if (dwc->dr_mode == USB_DR_MODE_HOST ||
-	    dwc->dr_mode == USB_DR_MODE_OTG) {
+	if (!DWC3_VER_IS(DWC3, 200A) &&
+	    (dwc->dr_mode == USB_DR_MODE_HOST ||
+	     dwc->dr_mode == USB_DR_MODE_OTG)) {
 		reg = dwc3_readl(dwc->regs, DWC3_GUCTL);
 
 		/*


Thanks,
Thinh




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux