Hi Jake,
On 4/16/2024 7:20 AM, Grossman, Jake wrote:
Hello,
We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8
processor, but we are seeing issues with performance.
As a comparison, utilizing iperf3 to benchmark, we are able to see
~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware
USB-to-Ethernet peripheral.
Might help to also mention the USB to Ethernet adapter that is being
used in your comparison as well, since some vendors may have some
enhanced optimizations such as data aggregation, etc...
Also, what direction are you getting these numbers in? (ie USB IN or OUT
transfers)
Looking at the output of perf, we are seeing that with all of the gadget
drivers (RNDIS, UVC, ACM), there is significant time spent spinning in
an IRQ context that does not occur with the hardware peripheral. This
seems like it might be related to the interrupt handler as described
here: https://docs.kernel.org/usb/dwc3.html
<https://docs.kernel.org/usb/dwc3.html>.
1. We have not yet acquired technical documentation regarding the DWC3
module. Do you have a list of the DWC3 commands that have high
latency (~1ms)?
DWC3 gadget nowadays utilizes the updatexfer command compared to ages
ago where it would only queue with startxfer after every xfernotready
event. That shift definitely optimized how the SW can update the
controller on when new TRBs are submitted to the endpoint's TRB ring if
a transfer is already in progress.
2. Do you believe that implementing a per endpoint IRQ framework will
resolve the large disparity in performance? If not, do you have any
insight into what the root cause might be?
Honestly, based on previous throughput debug, most of the problems were
at the function driver level less so from the UDC. I'll echo what Greg
says about RNDIS, and say that, along with the security concerns, it
isn't the most optimized function for IP data transfers. In my
experience the NCM class w/ packet framing will result in much better
numbers than the default RNDIS configuration, as allowing data
aggregation will lessen the number of interrupts per IP packet.
Thinh will probably have some more comments, but just sharing my two
cents :). Might be good to get some more details on the above before we
can guide you in the right direction.
Thanks
Wesley Cheng
Thank you for your time and insight,
Jake Grossman