Re: usb: dwc3: gadget performance insight

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jake,

On 4/16/2024 7:20 AM, Grossman, Jake wrote:
Hello,

We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8 processor, but we are seeing issues with performance.

As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet peripheral.


Might help to also mention the USB to Ethernet adapter that is being used in your comparison as well, since some vendors may have some enhanced optimizations such as data aggregation, etc...

Also, what direction are you getting these numbers in? (ie USB IN or OUT transfers)

Looking at the output of perf, we are seeing that with all of the gadget drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ context that does not occur with the hardware peripheral. This seems like it might be related to the interrupt handler as described here: https://docs.kernel.org/usb/dwc3.html <https://docs.kernel.org/usb/dwc3.html>.

 1. We have not yet acquired technical documentation regarding the DWC3
    module.  Do you have a list of the DWC3 commands that have high
    latency (~1ms)?

DWC3 gadget nowadays utilizes the updatexfer command compared to ages ago where it would only queue with startxfer after every xfernotready event. That shift definitely optimized how the SW can update the controller on when new TRBs are submitted to the endpoint's TRB ring if a transfer is already in progress.

 2. Do you believe that implementing a per endpoint IRQ framework will
    resolve the large disparity in performance?  If not, do you have any
    insight into what the root cause might be?


Honestly, based on previous throughput debug, most of the problems were at the function driver level less so from the UDC. I'll echo what Greg says about RNDIS, and say that, along with the security concerns, it isn't the most optimized function for IP data transfers. In my experience the NCM class w/ packet framing will result in much better numbers than the default RNDIS configuration, as allowing data aggregation will lessen the number of interrupts per IP packet.

Thinh will probably have some more comments, but just sharing my two cents :). Might be good to get some more details on the above before we can guide you in the right direction.


Thanks
Wesley Cheng

Thank you for your time and insight,

Jake Grossman





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux