Hi all, I've done a few more tests. I'm also adding the required information described in DWC3 documentation which I previously missed. > The OS is then unable to recover (I have rootfs on that SSD too) and > the board must be manually restarted. I have resolved this by creating a fresh Ubuntu 20.04 rootfs on an SD card. The system now survives the controller crash. The xHC can also be brought up again by unbinding the dwc3 driver and then binding it back. > Dmesg contains the following output: > < ... > It turns out that this was not the full relevant output. I was collecting the logs from a serial console and I haven't properly enabled verbose printing. Hopefully the full dmesg is now linked below. > The crash is happening when the USB-SATA bridge is controlled by the > uas driver. I have not tested the usb-storage driver yet. I tested this now. With usb-storage the controller is stable, but the achievable throughput is lower (75 MB/s BOT vs 300 MB/s UAS). --- With the rootfs on the SD card, I was able to capture a DWC3 event trace & register dump. I am running clean 6.4-rc6 with a config similar to multi_v7_defconfig (see below for details). To capture the trace, I followed these steps: 1. Unbind the DWC3 driver from the controller (12000000.usb). 2. Enable DWC3 tracing. 3. Bind the DWC3 driver back. 4. Save the DWC3 register dump to "regdump-before-fio.txt". 5. Run the FIO stress test from the first email. Once FIO stops printing IOPS, dump registers again to "regdump-during-freeze.txt" 6. Once FIO exits and the kernel prints the "HC died" message, dump registers once more to "regdump-after-hc-died.txt". 7. Save the current trace buffer to "trace.txt". 8. Save the current kernel log to "dmesg.txt". I had to do the DWC3 unbind-bind dance because I have no way of unplugging the onboard JMS578 bridge from the main Exynos chip. The resulting files can be found in the attached tarball including the kernel config (I kept ARM_EXYNOS_BUS_DEVFREQ enabled this time). Dmesg.txt is also available at https://pastebin.com/EkfXKMih . I am not 100% sure this is not a hardware fault. However, there are a few Exynos5422-based Odroid users experiencing a similar issue. Most of them mention kernel 5.4, which does contain the bisected bad commit. - https://forum.odroid.com/viewtopic.php?t=42630 (report mine, but there are some people having the same issue) - https://forum.odroid.com/viewtopic.php?t=46409 - https://forum.armbian.com/topic/20582-odroid-xu4-usb-sata-ssd-drive-random-disconnect/ Please let me know if I you need more information. Thank you, Jakub Vanek
Attachment:
dwc3-logs.tar.gz
Description: application/compressed-tar