Re: [PATCH] Re: xhci bandwidth problem with isochronous endpoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/08/2017 12:19 PM, Curt Meyers wrote:
On 11/08/2017 03:42 AM, Mathias Nyman wrote:
On 07.11.2017 02:05, Curt Meyers wrote:
On 11/06/2017 06:57 AM, Mathias Nyman wrote:
On 03.11.2017 23:37, Curt Meyers wrote:
On 09/19/2017 06:00 AM, Mathias Nyman wrote:
On 08.09.2017 20:35, Curt Meyers wrote:
On 09/05/2017 02:56 PM, Curt Meyers wrote:
On 09/04/2017 04:13 AM, Mathias Nyman wrote:
On 04.09.2017 13:46, Felipe Balbi wrote:

Hi,

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:
Unfortunately config endpoint command doesn't log endpoint context in this log, it should call trace_xhci_handle_cmd_config_ep(ep_ctx), I don't know why it's missing.

That's called conditionally:

    case TRB_CONFIG_EP:
        if (!cmd->completion)
            xhci_handle_cmd_config_ep(xhci, slot_id, event,
                          cmd_comp_code);

Yep, need to change the tracing so we get it for every config endpoint command


But later on at Set TR Dequeue Pointer Command it logs the endpiont context:

259.147237: xhci_handle_cmd_set_deq: RS 00000 super-speed Ctx Entries 7 MEL 512 us Port# 19/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 1 State configured 259.147238: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 201326592 CErr 0 Type Isoc IN burst 2 maxp 1024 deq 00000003f9fd6510 avg trb len 3072

This looks odd,  201326592 bytes per ESIT, way too much. So much that I suspect tracing decodes it wrong

try this:

modified   drivers/usb/host/xhci.h
@@ -2540,9 +2540,7 @@ static inline const char *xhci_decode_ep_context(u32 info, u32 info2, u64 deq,
      u8 lsa;
      u8 hid;

-    esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 |
-        EP_MAX_ESIT_PAYLOAD_LO(tx_info);
-
+    esit = CTX_TO_MAX_ESIT_PAYLOAD(info);
      ep_state = info & EP_STATE_MASK;
      max_pstr = info & EP_MAXPSTREAMS_MASK;
      interval = CTX_TO_EP_INTERVAL(info);


Yes, I noticed the same, and there's also a high ESIT bit field for new hosts,
I pushed a fix to my for-usb-linus branch for that

+#define CTX_TO_MAX_ESIT_PAYLOAD_HI(p)  (((p) >> 24) & 0xff)

...

-       esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 |
-               EP_MAX_ESIT_PAYLOAD_LO(tx_info);
+       esit = CTX_TO_MAX_ESIT_PAYLOAD_HI(info) << 16 |
+               CTX_TO_MAX_ESIT_PAYLOAD(tx_info);

-Mathias
Hi Mathias,



Hi Sorry about the delay


I have made the above suggested changed and produced a new trace file. I hope this helps out.

Curt

Would it help if I sent you a couple of these USB camera devices to speed up debugging?


Not really, It's just my TODO list is already quite long, and then there's always some urgent matter
that requires attention.

The new log however still shows the wrong ESIT payload values,
 xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 201326592

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hi Mathias,

The manufacture hired a Linux developer to solve the problem with this isochronous camera device and it turns out to be related to LPM that only shows up in newer Intel hardware, according to the developer. He says that there is something wrong with the MEL (Max Exit Latency) and skipping it in the call to xhci_configure_endpoint() fixes the problem, or works around it.

Curt


Ok, good to know

If you are in contact with this developer please tell him that we are
interested in getting more details about the LPM issue, and getting it
fixed upstream.

Thanks
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

This patch works on the current 4.10 kernel tree in Ubuntu. This fixes the problem we where seeing on an Intel Sunrise chipset but I also see the same failure on the Intel controller with PCI device ID 0xa2af after generating this patch and testing it. I do not understand what the common thread is between all the various Intel controllers so perhaps you can help out in making this more broad.

If you want the patch applied to a specific branch let me know and I can try it again but let me know specifically which tag or branch to apply this to.

Signed-off-by: Curt Meyers <cmeyers@xxxxxx>
---

diff -Naur host.ori/xhci.c host/xhci.c
--- host.ori/xhci.c    2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.c    2017-11-03 00:07:04.548216000 +0800
@@ -3986,6 +3986,26 @@
      unsigned long flags;
      int ret;

+    bool Period_EP_Found = false;
+    if ((xhci->quirks & XHCI_SET_MEL_STUCK) && (USB_SPEED_SUPER == udev->speed)) {
+        int i = 0;
+        for (i = 1; i < 16; ++i) {
+            int ep_in_type  = (udev && udev->ep_in[i]) ?(usb_endpoint_type(&udev->ep_in[i]->desc))  :(-1); +            int ep_out_type = (udev && udev->ep_out[i]) ?(usb_endpoint_type(&udev->ep_out[i]->desc)) :(-1);
+
+            if ((USB_ENDPOINT_XFER_ISOC == ep_in_type) || (USB_ENDPOINT_XFER_INT == ep_in_type)) {
+                Period_EP_Found = true;
+                break;
+            }
+
+            if ((USB_ENDPOINT_XFER_ISOC == ep_out_type) || (USB_ENDPOINT_XFER_INT == ep_out_type)) {
+                Period_EP_Found = true;
+                break;
+            }
+        }
+    }
+
+
      spin_lock_irqsave(&xhci->lock, flags);

      virt_dev = xhci->devs[udev->slot_id];
@@ -4026,8 +4046,9 @@
      xhci_dbg_ctx(xhci, command->in_ctx, 0);

      /* Issue and wait for the evaluate context command. */
-    ret = xhci_configure_endpoint(xhci, udev, command,
-            true, true);
+    ret = (!Period_EP_Found)
+        ? xhci_configure_endpoint(xhci, udev, command, true, true)
+        : xhci_configure_endpoint(xhci, udev, command, false, true);
      xhci_dbg(xhci, "Slot %u Output Context:\n", udev->slot_id);

Ok, so basically this code forces a 'configure endpoint' command instead of a 'evaluate context'
command if there are periodic endpoints in use.

I'm not sure about the reasoning behind this.

I do know that configure endpoint command will release the bandwidth kept by
dropped endpoints, but setting MEL values are usually done using the
'evaluate context' command.

-Mathias

I just received this updated patch, it adds a call to spin_unlock_irqrestore() and a comment for the code.

The comment says that LPM with isochronous endpoints causes config_EP to fail.

---

diff -Naur host.ori/xhci.c host/xhci.c
--- host.ori/xhci.c 2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.c 2017-11-08 01:40:38.929681000 +0800
@@ -4001,6 +4001,20 @@
    return 0;
  }

+ // Workaround: Enable LPM when a SS ISOC EP is opened will cause all subsequent config_EP fail + if ((xhci->quirks & XHCI_SET_MEL_STUCK) && (USB_SPEED_SUPER == udev->speed) && (max_exit_latency > 0))
+ {
+   int i;
+   for (i = 0; i < 31; i++) {
+     struct xhci_ep_ctx *ep_ctx = xhci_get_ep_ctx(xhci, virt_dev->out_ctx, i);
+     int ep_type = CTX_TO_EP_TYPE(le32_to_cpu(ep_ctx->ep_info2));
+     if ((ISOC_IN_EP == ep_type) || (ISOC_OUT_EP == ep_type)) {
+       spin_unlock_irqrestore(&xhci->lock, flags);
+       return 0;
+     }
+   }
+ }
+
  /* Attempt to issue an Evaluate Context command to change the MEL. */
  command = xhci->lpm_command;
  ctrl_ctx = xhci_get_input_control_ctx(command->in_ctx);
diff -Naur host.ori/xhci.h host/xhci.h
--- host.ori/xhci.h 2017-11-02 01:50:04.544937000 +0800
+++ host/xhci.h 2017-11-02 23:41:17.000000000 +0800
@@ -1650,6 +1650,8 @@
 #define XHCI_SSIC_PORT_UNUSED  (1 << 22)
 #define XHCI_NO_64BIT_SUPPORT  (1 << 23)
 #define XHCI_MISSING_CAS (1 << 24)
+#define XHCI_SET_MEL_STUCK (1 << 31)
+
  unsigned int    num_active_eps;
  unsigned int    limit_active_eps;
  /* There are two roothubs to keep track of bus suspend info for */
diff -Naur host.ori/xhci-pci.c host/xhci-pci.c
--- host.ori/xhci-pci.c 2017-11-02 01:50:04.536937000 +0800
+++ host/xhci-pci.c 2017-11-08 01:40:38.929681000 +0800
@@ -168,6 +168,7 @@
     pdev->device == PCI_DEVICE_ID_INTEL_BROXTON_B_XHCI ||
     pdev->device == PCI_DEVICE_ID_INTEL_APL_XHCI)) {
    xhci->quirks |= XHCI_PME_STUCK_QUIRK;
+   xhci->quirks |= XHCI_SET_MEL_STUCK;
  }
  if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
     pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {

---


~
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi Mathias,

Not sure if you saw my second diff, it is slightly different than the first. Since I am going between the developer and the maintainers is there something I can do to move this investigation along?

This patch provides a fix for the serious problem we have found with USB3 isochronous devices running on newer Intel hardware.

Perhaps you or someone else at Intel that is very familiar with the USB hardware can better understand the implications of this patch.

Thanks,
Curt

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux