Re: [PATCH] Re: xhci bandwidth problem with isochronous endpoints

Curt Meyers <cmeyers@xxxxxx> · Tue, 14 Nov 2017 11:16:53 -0800

On 11/08/2017 12:19 PM, Curt Meyers wrote:
On 11/08/2017 03:42 AM, Mathias Nyman wrote:
On 07.11.2017 02:05, Curt Meyers wrote:
On 11/06/2017 06:57 AM, Mathias Nyman wrote:
On 03.11.2017 23:37, Curt Meyers wrote:
On 09/19/2017 06:00 AM, Mathias Nyman wrote:
On 08.09.2017 20:35, Curt Meyers wrote:
On 09/05/2017 02:56 PM, Curt Meyers wrote:
On 09/04/2017 04:13 AM, Mathias Nyman wrote:
On 04.09.2017 13:46, Felipe Balbi wrote:

Hi,

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:
Unfortunately config endpoint command doesn't log endpoint 
context in this log,
it should call trace_xhci_handle_cmd_config_ep(ep_ctx), I 
don't know why it's missing.

That's called conditionally:

    case TRB_CONFIG_EP:
        if (!cmd->completion)
            xhci_handle_cmd_config_ep(xhci, slot_id, event,
                          cmd_comp_code);

Yep, need to change the tracing so we get it for every config 
endpoint command


But later on at Set TR Dequeue Pointer Command it logs the 
endpiont context:

259.147237: xhci_handle_cmd_set_deq: RS 00000 super-speed 
Ctx Entries 7 MEL 512 us Port# 19/0 [TT Slot 0 Port# 0 TTT 0 
Intr 0] Addr 1 State configured
259.147238: xhci_handle_cmd_set_deq_ep: State stopped mult 1 
max P. Streams 0 interval 125 us max ESIT payload 201326592 
CErr 0 Type Isoc IN burst 2 maxp 1024 deq 00000003f9fd6510 
avg trb len 3072

This looks odd,  201326592 bytes per ESIT, way too much. So 
much that I suspect tracing decodes it wrong

try this:

modified   drivers/usb/host/xhci.h
@@ -2540,9 +2540,7 @@ static inline const char 
*xhci_decode_ep_context(u32 info, u32 info2, u64 deq,
      u8 lsa;
      u8 hid;

-    esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 |
-        EP_MAX_ESIT_PAYLOAD_LO(tx_info);
-
+    esit = CTX_TO_MAX_ESIT_PAYLOAD(info);
      ep_state = info & EP_STATE_MASK;
      max_pstr = info & EP_MAXPSTREAMS_MASK;
      interval = CTX_TO_EP_INTERVAL(info);


Yes, I noticed the same, and there's also a high ESIT bit 
field for new hosts,
I pushed a fix to my for-usb-linus branch for that

+#define CTX_TO_MAX_ESIT_PAYLOAD_HI(p)  (((p) >> 24) & 0xff)

...

-       esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 |
-               EP_MAX_ESIT_PAYLOAD_LO(tx_info);
+       esit = CTX_TO_MAX_ESIT_PAYLOAD_HI(info) << 16 |
+               CTX_TO_MAX_ESIT_PAYLOAD(tx_info);

-Mathias
Hi Mathias,



Hi Sorry about the delay


I have made the above suggested changed and produced a new 
trace file. I hope this helps out.

Curt

Would it help if I sent you a couple of these USB camera devices 
to speed up debugging?


Not really, It's just my TODO list is already quite long, and 
then there's always some urgent matter
that requires attention.

The new log however still shows the wrong ESIT payload values,
 xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 
0 interval 125 us max ESIT payload 201326592

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe 
linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hi Mathias,

The manufacture hired a Linux developer to solve the problem with 
this isochronous camera device and it turns out to be related to 
LPM that only shows up in newer Intel hardware, according to the 
developer. He says that there is something wrong with the MEL (Max 
Exit Latency) and skipping it in the call to 
xhci_configure_endpoint() fixes the problem, or works around it.

Curt


Ok, good to know

If you are in contact with this developer please tell him that we are
interested in getting more details about the LPM issue, and getting it
fixed upstream.

Thanks
Mathias
--
To unsubscribe from this list: send the line "unsubscribe 
linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

This patch works on the current 4.10 kernel tree in Ubuntu. This 
fixes the problem we where seeing on an Intel Sunrise chipset but I 
also see the same failure on the Intel controller with PCI device ID 
0xa2af after generating this patch and testing it. I do not 
understand what the common thread is between all the various Intel 
controllers so perhaps you can help out in making this more broad.

If you want the patch applied to a specific branch let me know and I 
can try it again but let me know specifically which tag or branch to 
apply this to.

Signed-off-by: Curt Meyers <cmeyers@xxxxxx>
---

diff -Naur host.ori/xhci.c host/xhci.c

--- host.ori/xhci.c    2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.c    2017-11-03 00:07:04.548216000 +0800
@@ -3986,6 +3986,26 @@
      unsigned long flags;
      int ret;

+    bool Period_EP_Found = false;
+    if ((xhci->quirks & XHCI_SET_MEL_STUCK) && (USB_SPEED_SUPER == 
udev->speed)) {
+        int i = 0;
+        for (i = 1; i < 16; ++i) {
+            int ep_in_type  = (udev && udev->ep_in[i]) 
?(usb_endpoint_type(&udev->ep_in[i]->desc))  :(-1);
+            int ep_out_type = (udev && udev->ep_out[i]) 
?(usb_endpoint_type(&udev->ep_out[i]->desc)) :(-1);
+
+            if ((USB_ENDPOINT_XFER_ISOC == ep_in_type) || 
(USB_ENDPOINT_XFER_INT == ep_in_type)) {
+                Period_EP_Found = true;
+                break;
+            }
+
+            if ((USB_ENDPOINT_XFER_ISOC == ep_out_type) || 
(USB_ENDPOINT_XFER_INT == ep_out_type)) {
+                Period_EP_Found = true;
+                break;
+            }
+        }
+    }
+
+
      spin_lock_irqsave(&xhci->lock, flags);

      virt_dev = xhci->devs[udev->slot_id];
@@ -4026,8 +4046,9 @@
      xhci_dbg_ctx(xhci, command->in_ctx, 0);

      /* Issue and wait for the evaluate context command. */
-    ret = xhci_configure_endpoint(xhci, udev, command,
-            true, true);
+    ret = (!Period_EP_Found)
+        ? xhci_configure_endpoint(xhci, udev, command, true, true)
+        : xhci_configure_endpoint(xhci, udev, command, false, true);
      xhci_dbg(xhci, "Slot %u Output Context:\n", udev->slot_id);

Ok, so basically this code forces a 'configure endpoint' command 
instead of a 'evaluate context'
command if there are periodic endpoints in use.

I'm not sure about the reasoning behind this.

I do know that configure endpoint command will release the bandwidth 
kept by
dropped endpoints, but setting MEL values are usually done using the
'evaluate context' command.

-Mathias

I just received this updated patch, it adds a call to 
spin_unlock_irqrestore() and a comment for the code.

The comment says that LPM with isochronous endpoints causes config_EP 
to fail.

---

diff -Naur host.ori/xhci.c host/xhci.c
--- host.ori/xhci.c 2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.c 2017-11-08 01:40:38.929681000 +0800
@@ -4001,6 +4001,20 @@
    return 0;
  }

+ // Workaround: Enable LPM when a SS ISOC EP is opened will cause all 
subsequent config_EP fail
+ if ((xhci->quirks & XHCI_SET_MEL_STUCK) && (USB_SPEED_SUPER == 
udev->speed) && (max_exit_latency > 0))
+ {
+   int i;
+   for (i = 0; i < 31; i++) {
+     struct xhci_ep_ctx *ep_ctx = xhci_get_ep_ctx(xhci, 
virt_dev->out_ctx, i);
+     int ep_type = CTX_TO_EP_TYPE(le32_to_cpu(ep_ctx->ep_info2));
+     if ((ISOC_IN_EP == ep_type) || (ISOC_OUT_EP == ep_type)) {
+       spin_unlock_irqrestore(&xhci->lock, flags);
+       return 0;
+     }
+   }
+ }
+
  /* Attempt to issue an Evaluate Context command to change the MEL. */
  command = xhci->lpm_command;
  ctrl_ctx = xhci_get_input_control_ctx(command->in_ctx);
diff -Naur host.ori/xhci.h host/xhci.h
--- host.ori/xhci.h 2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.h 2017-11-02 23:41:17.000000000 +0800
@@ -1650,6 +1650,8 @@
 #define XHCI_SSIC_PORT_UNUSED  (1 << 22)
 #define XHCI_NO_64BIT_SUPPORT  (1 << 23)
 #define XHCI_MISSING_CAS (1 << 24)
+#define XHCI_SET_MEL_STUCK (1 << 31)
+
  unsigned int    num_active_eps;
  unsigned int    limit_active_eps;
  /* There are two roothubs to keep track of bus suspend info for */
diff -Naur host.ori/xhci-pci.c host/xhci-pci.c
--- host.ori/xhci-pci.c 2017-11-02 01:50:04.536937000 +0800
+++ host/xhci-pci.c 2017-11-08 01:40:38.929681000 +0800
@@ -168,6 +168,7 @@
     pdev->device == PCI_DEVICE_ID_INTEL_BROXTON_B_XHCI ||
     pdev->device == PCI_DEVICE_ID_INTEL_APL_XHCI)) {
    xhci->quirks |= XHCI_PME_STUCK_QUIRK;
+   xhci->quirks |= XHCI_SET_MEL_STUCK;
  }
  if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
     pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {

---


~
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi Mathias,

Not sure if you saw my second diff, it is slightly different than the 
first. Since I am going between the developer and the maintainers is 
there something I can do to move this investigation along?

This patch provides a fix for the serious problem we have found with 
USB3 isochronous devices running on newer Intel hardware.

Perhaps you or someone else at Intel that is very familiar with the USB 
hardware can better understand the implications of this patch.

Thanks,
Curt

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html