Hello, On Mon, Mar 25, 2019 at 09:34:38AM +0200, Felipe Balbi wrote: > > Hi, > > Rob Weber <rob@xxxxxxxxxxx> writes: > > Hi linux-usb, > > > > My team and I are currently grappling with an issue where our custom > > hardware cannot maintain a SuperSpeed connection to a host computer > > while in device mode. After some time (as quickly as 5 seconds, or as > > long as 5 minutes), the USB connection will often drop from a SuperSpeed > > connection to a high speed connection. I would appreciate some help our > > pointers in how to dig deeper into this issue. This bug is starting to > > stretch my knowledge of USB, so any guidance is really appreciated. I've > > gathered a lot of debug information and logs which I describe in more > > detail below. > > > > Our custom design is based on the Intel CherryTrail z8550 processor. This > > SoC uses the DWC3 UDC controller. The USB port in question on our board > > please collect dwc3 tracepoints. The procedure is described here: > > https://www.kernel.org/doc/html/latest/driver-api/usb/dwc3.html#reporting-bugs > > > is a USB Type-C port that supports both UFP and DFP data roles, up to > > 5V/1.5A as a power source, and up to 30W as a power sink during USB-PD > > negotiations. We use a Cypress CCG3 for the USB-PD controller which is > > configured and controlled via I2C from an STM32 microcontroller in our > > design. We also use a SuperSpeed redriver and mux to re-condition the > > signals that come in and out of the board and to multiplex the two SS > > channels of the type-C connector into one port of the SoC based on the > > cable orientation. When connecting our device to a host with a USB type > > A cable, we configure our internal battery charger to draw 100mA of power > > from the host. If PD negotiations occur, we will configure the charger to > > pull the current that was requested/accepted during PD negotiations. > > > > Our device is running Linux kernel version 4.9.115 with a few backported > > Can you run v5.0 to see if the problem is reproduced there? > > > patches to support the intel-xhci-usb-sw role switch driver that was > > introduced in version 4.17. We ported part of the following patchset for > > this: https://lkml.org/lkml/2018/3/20/387. I am working on building our > > OS with the 4.19.18 kernel to understand if this issue is reproducible > > on more recent versions of the kernel. > > Thank you, that will help a lot :-) > > > We are able to reproduce this with the default configuration of the > > legacy g_mass_storage module: > > > > echo device > /sys/class/usb_role/intel_xhci_usb_sw-role-switch/role > > modprobe g_mass_storage file=/dev/nvme0n1p3 iSerialNumber=90405 > > > > We are also able to reproduce this issue using f_mass_storage and > > configfs. I decided to provide the logs and debug information while > > using g_mass_storage beccause it a standard gadget configuration available > > to everyone. I've been using an Ubuntu 18.04 host with Linux kernel 4.15.0 > > for most debugging as I'm able to capture the most debug information with > > Linux, but this issue is reproducible with Mac OSX and Windows hosts as > > well. > > > > I've attached an archive titled > > "host-ss-to-hs-logs.tar.gz" that contains the xHCI dynamic debug logs > > and demsg output during this event, as well as the output of lsusb -vv > > for our the device. Here is a quick timeline of events from the host's > > perspective: > > [10598.165931] host controller detects new USB connection > > [10600.773788] The ext4 filesystem exposed by the device gadget is mounted > > [10680.442486] We start to see xhci_hcd activity again > > [10684.534818] We receive the first "cannot enable" / "cannot set link state" error > > [10688.627381] We receive another "cannot enable" / "cannot set link state" error > > [10688.755190] We see the usb device be recognized as a high-speed device instead > > > > I've attached another archive titled "device-ss-to-hs-logs.tar.gz" that > > contains the dwc3 trace events for this event, the dmesg output, and the > > dwc3 regdump before and after the superspeed-to-highspeed transition. The > > cool!! Before I had to ask :-) > > > dmesg output is short (4 lines) and is probably helpful to find the > > meaningful timestamps of the dwc3 traces. The sequence of events on the > > device seems to start around 1346 seconds and ends around 1436 seconds. > > right, from device perspective I see things behaving normally, then a > "random" reset shows up. In what way does this show up? Does the host send a reset command, does the device request the host port to reset, or does the device appear to reset itself? > > Please note the "dwc3.1.auto: failed to enable ep0out" errors that are > > occurring when we connect our device to a host. These errors occur > > pretty regularly and we are not sure what is causing them. > > this shouldn't happen, actually; where do you see those? They are in the dmesg output on the device. I was assuming they line up with tracepoints in the dwc traces, but I wasn't sure what to look for. This error occurs approximately 75% of the time while the device is enumerating. > Anyway, here's the important part from dwc3 traces: > > > irq/23-dwc3-995 [000] d..1 1422.081846: dwc3_complete_trb: ep1in: 1/2 trb ffff8800364327d0 buf 00000000748ff000 size 0 ctrl 00000c18 (hlcS:SC:normal) > > irq/23-dwc3-995 [000] d..1 1422.081853: dwc3_gadget_giveback: ep1in: req ffff880177cc4e40 length 16384/16384 zsI ==> 0 > > completed 16kiB transfer block > > > irq/23-dwc3-995 [000] d..1 1422.081890: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [196615] params 00000000 00000000 00000000 --> status: Successful > > file-storage-994 [001] d..1 1422.081917: dwc3_ep_queue: ep1in: req ffff880177cc4e40 length 0/13 zsI ==> -115 > > file-storage-994 [001] d..1 1422.081928: dwc3_gadget: ep1in: req ffff880177cc4e40 dma 74905000 length 13 > > file-storage-994 [001] d..1 1422.081938: dwc3_prepare_trb: ep1in: 2/2 trb ffff8800364327f0 buf 0000000074905000 size 13 ctrl 00000c19 (HlcS:SC:normal) > > file-storage-994 [001] d..1 1422.081972: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [196615] params 00000000 00000000 00000000 --> status: Successful > > irq/23-dwc3-995 [000] d..1 1422.081980: dwc3_event: event (00100301): Link Change [U0] > > irq/23-dwc3-995 [000] d..1 1422.081988: dwc3_event: event (00110301): Link Change [U1] > > irq/23-dwc3-995 [000] d..1 1422.081997: dwc3_event: event (00004086): ep1in: Transfer In-Progress > > irq/23-dwc3-995 [000] d..1 1422.082000: dwc3_complete_trb: ep1in: 1/2 trb ffff8800364327e0 buf 0000000074903000 size 0 ctrl 00000c18 (hlcS:SC:normal) > > irq/23-dwc3-995 [000] d..1 1422.082008: dwc3_gadget_giveback: ep1in: req ffff880177cc4b40 length 8192/8192 zsI ==> 0 > > 8kiB block > > > irq/23-dwc3-995 [000] d..1 1422.082048: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [196615] params 00000000 00000000 00000000 --> status: Successful > > file-storage-994 [001] d..1 1422.082089: dwc3_ep_queue: ep1out: req ffff880177cc4d80 length 0/1024 zsI ==> -115 > > file-storage-994 [001] d..1 1422.082101: dwc3_gadget: ep1out: req ffff880177cc4d80 dma 74905800 length 1024 > > file-storage-994 [001] d..1 1422.082111: dwc3_prepare_trb: ep1out: 4/2 trb ffff880036423780 buf 0000000074905800 size 1024 ctrl 00000c19 (HlcS:SC:normal) > > file-storage-994 [001] d..1 1422.082144: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [131079] params 00000000 00000000 00000000 --> status: Successful > > irq/23-dwc3-995 [000] d..1 1422.082147: dwc3_event: event (00100301): Link Change [U0] > > irq/23-dwc3-995 [000] d..1 1422.082155: dwc3_event: event (00110301): Link Change [U1] > > irq/23-dwc3-995 [000] d..1 1422.082160: dwc3_event: event (00100301): Link Change [U0] > > irq/23-dwc3-995 [000] d..1 1422.082167: dwc3_event: event (00110301): Link Change [U1] > > irq/23-dwc3-995 [000] d..1 1422.082173: dwc3_event: event (00004086): ep1in: Transfer In-Progress > > irq/23-dwc3-995 [000] d..1 1422.082176: dwc3_complete_trb: ep1in: 0/2 trb ffff8800364327f0 buf 0000000074905000 size 0 ctrl 00000c18 (hlcS:SC:normal) > > irq/23-dwc3-995 [000] d..1 1422.082181: dwc3_gadget_giveback: ep1in: req ffff880177cc4e40 length 13/13 zsI ==> 0 > > CSW What is CSW? > > irq/23-dwc3-995 [000] d..1 1422.092840: dwc3_event: event (00120301): Link Change [U2] > > irq/23-dwc3-995 [000] d..1 1427.800204: dwc3_event: event (00160301): Link Change [SS.Inactive] > > irq/23-dwc3-995 [000] d..1 1427.813205: dwc3_event: event (00150301): Link Change [RX.Detect] > > irq/23-dwc3-995 [000] d..1 1427.911837: dwc3_event: event (00040301): Link Change [SS.Disabled] > > irq/23-dwc3-995 [000] d..1 1427.911849: dwc3_event: event (00000301): Link Change [U0] > > irq/23-dwc3-995 [000] d..1 1427.914818: dwc3_event: event (00050301): Link Change [RX.Detect] > > irq/23-dwc3-995 [000] d..1 1427.917746: dwc3_event: event (00030301): Link Change [U3] > > link went to U3. Why would xhci put the link to U3 now? We're still > working fine. This looks like a pretty crazy set of transitions. I'm unfamiliar with LPM. Is this sequence of events normal for a transition to U3? During this particular run, I was not performing any operations on the mass storage device besides mounting it on my computer. No reading or writing besides that. Would this sort of inactivity have something to do with the transition to U3? > > irq/23-dwc3-995 [000] d..1 1435.982020: dwc3_event: event (00000101): Reset [U0] > > 8 seconds later, XHCI drivers a bus Reset > > > irq/23-dwc3-995 [000] d..1 1435.982116: dwc3_gadget_ep_cmd: ep1out: cmd 'End Transfer' [133384] params 00000000 00000000 00000000 --> status: Successful > > file-storage-994 [001] .... 1435.982186: dwc3_ep_dequeue: ep1out: req ffff880177cc4d80 length 0/1024 zsI ==> -115 > > irq/23-dwc3-995 [000] d..1 1435.982230: dwc3_gadget_giveback: ep1out: req ffff880177cc4d80 length 0/1024 zsI ==> -108 > > irq/23-dwc3-995 [000] d..1 1435.990478: dwc3_gadget_ep_cmd: ep1in: cmd 'End Transfer' [198920] params 00000000 00000000 00000000 --> status: Successful > > file-storage-994 [001] .... 1435.990596: dwc3_free_request: ep1in: req ffff880177cc4b40 length 8192/8192 zsI ==> 0 > > file-storage-994 [001] .... 1435.990607: dwc3_free_request: ep1out: req ffff880177cc4d80 length 0/1024 zsI ==> -108 > > file-storage-994 [001] .... 1435.990612: dwc3_free_request: ep1in: req ffff880177cc4e40 length 13/13 zsI ==> 0 > > file-storage-994 [001] .... 1435.990616: dwc3_free_request: ep1out: req ffff880177cc4f00 length 31/1024 zsI ==> 0 > > file-storage-994 [001] d..1 1435.990622: dwc3_gadget: Disabling ep1in > > file-storage-994 [001] d..1 1435.990641: dwc3_gadget: Disabling ep1out > > irq/23-dwc3-995 [000] d..1 1435.990710: dwc3_event: event (080201c4): ep1out: Endpoint Command Complete > > irq/23-dwc3-995 [000] d..1 1435.990718: dwc3_event: event (080301c6): ep1in: Endpoint Command Complete > > All endpoints are disabled and transfers are cancelled. > > > irq/23-dwc3-995 [000] d..1 1436.038946: dwc3_event: event (00000201): Connection Done [U0] > > irq/23-dwc3-995 [000] d..1 1436.038987: dwc3_gadget: Enabling ep0out > > irq/23-dwc3-995 [000] d..1 1436.039037: dwc3_gadget_ep_cmd: ep0out: cmd 'Set Endpoint Configuration' [1] params 80000200 00000500 00000000 --> status: Successful > > new enumeration in HS. The first parameter here contains the packet size > in the first 16 bits: > > 0x0200 >> 3 = 64 > > > irq/23-dwc3-995 [000] d..1 1436.039053: dwc3_gadget: Enabling ep0in > > irq/23-dwc3-995 [000] d..1 1436.039095: dwc3_gadget_ep_cmd: ep0in: cmd 'Set Endpoint Configuration' [1] params 80000200 02000500 00000000 --> status: Successful > > irq/23-dwc3-995 [000] d..1 1436.039114: dwc3_event: event (00000301): Link Change [U0] > > irq/23-dwc3-995 [000] d..1 1436.109708: dwc3_event: event (0000c040): ep0out: Transfer Complete > > irq/23-dwc3-995 [000] d..1 1436.109722: dwc3_ep0: ep0out: Transfer Complete: state 'Setup Phase' > > irq/23-dwc3-995 [000] d..1 1436.109728: dwc3_ep0: Setup Phase > > Transfers continue in HW after this. > > > Lastly, I've been able to recreate this event while monitoring the > > activity with a USB protocol analyzer. I've attached a screenshot of the > > USB transactions that happen around the time of the drop from SuperSpeed > > to high speed. I can also provide an archive containing the raw output > > that can be viewed in the "USB Protocol Suite" software provided by > > Teledyne LaCroy if you're interested. The raw data is too large to > > include on this email. > > Sure, that would be great. How can I download it? Any link which I can > curl? You can send it to me only, if you want; no problem. You (and anyone reading) can download it here: https://drive.google.com/file/d/1WSJ-222bguXTsRZ-mI5u2M7IP9FsZdj7/view?usp=sharing > > I would appreciate any help or guidance anyone could provide in > > understanding more about this issue. We really want to understand why > > we're transitioning from superspeed to high speed and whether or not > > this issue is caused by software or hardware. I understand it might be > > difficult for someone to help with a custom design and an old kernel, > > but any pointers for further debugging would be incredibly helpful. > > Thank you in advance for taking a look at all this data! > > Sure thing, I'll ask Mathias (XHCI maintainer) for his help on this too > since it's unclear what's causing the switch to HS when exitting U3. Thanks in advance for taking a look, Mathias! If there is any more information you need from me, please let me know. Cheers, Rob Weber