On 6/22/2021 1:09 PM, Ferry Toth wrote: > Hi > > Op 22-06-2021 om 20:38 schreef Wesley Cheng: >> >> On 6/19/2021 5:40 AM, Ferry Toth wrote: >>> Hi >>> >>> Op 18-06-2021 om 00:25 schreef Wesley Cheng: >>>> Hi, >>>> >>>> On 6/17/2021 2:55 PM, Ferry Toth wrote: >>>>> Hi >>>>> >>>>> Op 17-06-2021 om 23:48 schreef Wesley Cheng: >>>>>> Hi, >>>>>> >>>>>> On 6/17/2021 2:01 PM, Ferry Toth wrote: >>>>>>> Hi >>>>>>> >>>>>>> Op 17-06-2021 om 11:58 schreef Wesley Cheng: >>>>>>>> Changes in V10: >>>>>>>> - Fixed compilation errors in config where OF is not used >>>>>>>> (error due to >>>>>>>> unknown symbol for of_add_property()). Add of_add_property() >>>>>>>> stub. >>>>>>>> - Fixed compilation warning for incorrect argument being >>>>>>>> passed to >>>>>>>> dwc3_mdwidth >>>>>>> This fixes the OOPS I had in V9. I do not see any change in >>>>>>> performance >>>>>>> on Merrifield though. >>>>>> I see...thanks Ferry! With your testing, are you writing to the >>>>>> device's >>>>>> internal storage (ie UFS, eMMC, etc...), or did you use a ramdisk as >>>>>> well? >>>>> In this case I just tested the EEM path using iperf3. >>>>> >>>> Got it. I don't believe f_eem will use a high enough (if at all) >>>> bMaxBurst value to change the TXFIFO size. >>>> >>>>>> If not with a ramdisk, we might want to give that a try to avoid the >>>>>> storage path being the bottleneck. You can use "dd" to create an >>>>>> empty >>>>>> file, and then just use that as the LUN's backing file. >>>>>> >>>>>> echo ramdisk.img > >>>>>> /sys/kernel/config/usb_gadget/g1/functions/mass_storage.0/lun.0/file >>>>> Ah, why didn't I think of that. I have currently mass storage setup >>>>> with >>>>> eMMC but it seems that is indeed the bottleneck. >>>>> >>> I created a 64MB disk following the instructions here >>> http://www.linux-usb.org/gadget/file_storage.html (that seems a little >>> outdated, at least I can not start the first partition at sector 8, but >>> minimum 2048), and added a test file on it. >>> >>> I then copy the file to /dev/shm prior to setting configfs (composite >>> device gser/eem/mass_storage/uac2). >>> >>> journal shows: >>> >>> kernel: Mass Storage Function, version: 2009/09/11 >>> kernel: LUN: removable file: (no medium) >>> >>> I don't know what that means, because I see the test file on the >>> ramdisk. >>> >>> Then I again used gnome disks to benchmark (read/write 10MB): >>> >>> With V10 on top v5.13.0-rc5: >>> >>> R/W speed = 35.6/35.8MB/s, access time 0.35ms >>> >>> With no patches on top v5.12.0: >>> >>> R/W speed = 35.7/36.1MB/s, access time 0.35ms >> Hi Ferry, >> >>> I see no speed difference (and it's about the same as with the eMMC >>> backed disk). But the patches are causing a new call trace >>> >> Would you happen to know what DWC3 controller revision the device is >> using? The callstack print occurs, because it looks like it ran out of >> internal memory, although there should be logic present for making sure >> that at least there is enough room for 1 FIFO per endpoint. (possibly >> the logic/math depends on the controller revision) Hi Ferry, > > Do you know where I could find that in a file on the device? > Maybe you can just dump the DWC3 registers? cat /sys/kernel/debug/usb/<controller name>/regdump Was going to ask for the same to confirm the TXFIFO sizes from your results below :). > Otherwise, I'm hoping Andy will know? > >> >> Also, is there a way to use just a mass storage only composition? Based >> on the above observation, that probably means that the mass storage >> interface wasn't resized at all, because the configuration took up a lot >> of the internal FIFO space. > > Sure, it's configured through configfs. With only mass_storage I have: > > With V10 on top v5.13.0-rc5: > > R/W speed = 41,6/39,3MB/s, access time 0.33ms > > With no patches on top v5.12.0: > > R/W speed = 41,1/38,7MB/s, access time 0.38ms > Thanks Ferry! Could you collect the regdump, so I can confirm the two things mentioned? Thanks Wesley Cheng >> Thanks >> Wesley Cheng >> >>> kernel: using random self ethernet address >>> kernel: using random host ethernet address >>> kernel: Mass Storage Function, version: 2009/09/11 >>> kernel: LUN: removable file: (no medium) >>> kernel: usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>> kernel: usb0: MAC aa:bb:cc:dd:ee:f1 >>> kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>> depth:115540359 >>> kernel: ------------[ cut here ]------------ >>> kernel: WARNING: CPU: 0 PID: 594 at drivers/usb/gadget/udc/core.c:278 >>> usb_ep_queue+0x75/0x80 >>> kernel: Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage >>> usb_f_eem u_ether usb_f_serial u_serial libcomposite rfcomm iptable_nat >>> bnep snd_sof_nocodec spi_pxa2> >>> kernel: CPU: 0 PID: 594 Comm: irq/14-dwc3 Not tainted >>> 5.13.0-rc5-edison-acpi-standard #1 >>> kernel: Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 >>> 2015.01.21:18.19.48 >>> kernel: RIP: 0010:usb_ep_queue+0x75/0x80 >>> kernel: Code: 01 73 e4 48 8b 05 fb 63 06 01 48 85 c0 74 12 48 8b 78 08 >>> 44 89 e9 4c 89 e2 48 89 ee e8 74 05 00 00 44 89 e8 5d 41 5c 41 5d c3 >>> <0f> 0b 41 bd 94 ff ff ff > >>> kernel: RSP: 0000:ffff91eec083fc98 EFLAGS: 00010082 >>> kernel: RAX: ffff8af20357d960 RBX: 0000000000000000 RCX: >>> ffff8af202f06400 >>> kernel: RDX: 0000000000000a20 RSI: ffff8af208785780 RDI: >>> ffff8af202e9ae00 >>> kernel: RBP: ffff8af202e9ae00 R08: 00000000000000c0 R09: >>> ffff8af208785780 >>> kernel: R10: 00000000ffffe000 R11: 3fffffffffffffff R12: >>> ffff8af208785780 >>> kernel: R13: 0000000000000000 R14: ffff8af202e9ae00 R15: >>> ffff8af203e26cc0 >>> kernel: FS: 0000000000000000(0000) GS:ffff8af23e200000(0000) >>> knlGS:0000000000000000 >>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> kernel: CR2: 000055e2c21f2100 CR3: 0000000003b38000 CR4: >>> 00000000001006f0 >>> kernel: Call Trace: >>> kernel: u_audio_start_playback+0x107/0x1a0 [u_audio] >>> kernel: composite_setup+0x224/0x1ba0 [libcomposite] >>> kernel: ? dwc3_gadget_ep_queue+0xf6/0x1a0 >>> kernel: ? usb_ep_queue+0x2a/0x80 >>> kernel: ? configfs_composite_setup+0x6b/0x90 [libcomposite] >>> kernel: configfs_composite_setup+0x6b/0x90 [libcomposite] >>> kernel: dwc3_ep0_interrupt+0x469/0xa80 >>> kernel: dwc3_thread_interrupt+0x8ee/0xf40 >>> kernel: ? __wake_up_common_lock+0x85/0xb0 >>> kernel: ? disable_irq_nosync+0x10/0x10 >>> kernel: irq_thread_fn+0x1b/0x60 >>> kernel: irq_thread+0xd6/0x170 >>> kernel: ? irq_thread_check_affinity+0x70/0x70 >>> kernel: ? irq_forced_thread_fn+0x70/0x70 >>> kernel: kthread+0x116/0x130 >>> kernel: ? kthread_create_worker_on_cpu+0x60/0x60 >>> kernel: ret_from_fork+0x22/0x30 >>> kernel: ---[ end trace e5b9e28058c53584 ]--- >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: dwc3 dwc3.0.auto: request 000000003c32dcc5 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>> depth:115540359 >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: request 00000000036ac129 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>> depth:115540359 >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: dwc3 dwc3.0.auto: request 00000000ad1b8c18 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: request 00000000fbc71244 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>> depth:115540359 >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: dwc3 dwc3.0.auto: request 00000000fbc71244 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: request 00000000ad1b8c18 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>> depth:115540359 >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>> kernel: dwc3 dwc3.0.auto: request 000000003c32dcc5 was not queued to >>> ep5in >>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>> ep5in >>> >>> Removing uac2 from the config makes the call trace go away, but the R/W >>> speed does not change. >>> >>>> :), not a problem...I've been working on getting the ideal set up for >>>> the performance profiling for awhile, so anything I can do to make sure >>>> we get some good results. >>>> >>>>> I'll try with a ramdisk and let you know. >>>>> >>>> Thanks again for the testing, Ferry. >>>> >>>> Thanks >>>> Wesley Cheng >>>> >>>>>> Thanks >>>>>> Wesley Cheng >>>>>> >>>>>>>> Changes in V9: >>>>>>>> - Fixed incorrect patch in series. Removed changes in DTSI, as >>>>>>>> dwc3-qcom will >>>>>>>> add the property by default from the kernel. >>>>>>>> >>>>>>>> Changes in V8: >>>>>>>> - Rebased to usb-testing >>>>>>>> - Using devm_kzalloc for adding txfifo property in dwc3-qcom >>>>>>>> - Removed DWC3 QCOM ACPI property for enabling the txfifo >>>>>>>> resize >>>>>>>> >>>>>>>> Changes in V7: >>>>>>>> - Added a new property tx-fifo-max-num for limiting how much >>>>>>>> fifo >>>>>>>> space the >>>>>>>> resizing logic can allocate for endpoints with large burst >>>>>>>> values. This >>>>>>>> can differ across platforms, and tie in closely with overall >>>>>>>> system latency. >>>>>>>> - Added recommended checks for DWC32. >>>>>>>> - Added changes to set the tx-fifo-resize property from >>>>>>>> dwc3-qcom by >>>>>>>> default >>>>>>>> instead of modifying the current DTSI files. >>>>>>>> - Added comments on all APIs/variables introduced. >>>>>>>> - Updated the DWC3 YAML to include a better description of the >>>>>>>> tx-fifo-resize >>>>>>>> property and added an entry for tx-fifo-max-num. >>>>>>>> >>>>>>>> Changes in V6: >>>>>>>> - Rebased patches to usb-testing. >>>>>>>> - Renamed to PATCH series instead of RFC. >>>>>>>> - Checking for fs_descriptors instead of ss_descriptors for >>>>>>>> determining the >>>>>>>> endpoint count for a particular configuration. >>>>>>>> - Re-ordered patch series to fix patch dependencies. >>>>>>>> >>>>>>>> Changes in V5: >>>>>>>> - Added check_config() logic, which is used to communicate the >>>>>>>> number of EPs >>>>>>>> used in a particular configuration. Based on this, the DWC3 >>>>>>>> gadget driver >>>>>>>> has the ability to know the maximum number of eps utilized in >>>>>>>> all >>>>>>>> configs. >>>>>>>> This helps reduce unnecessary allocation to unused eps, >>>>>>>> and will >>>>>>>> catch fifo >>>>>>>> allocation issues at bind() time. >>>>>>>> - Fixed variable declaration to single line per variable, and >>>>>>>> reverse xmas. >>>>>>>> - Created a helper for fifo clearing, which is used by ep0.c >>>>>>>> >>>>>>>> Changes in V4: >>>>>>>> - Removed struct dwc3* as an argument for >>>>>>>> dwc3_gadget_resize_tx_fifos() >>>>>>>> - Removed WARN_ON(1) in case we run out of fifo space >>>>>>>> Changes in V3: >>>>>>>> - Removed "Reviewed-by" tags >>>>>>>> - Renamed series back to RFC >>>>>>>> - Modified logic to ensure that fifo_size is reset if we >>>>>>>> pass the >>>>>>>> minimum >>>>>>>> threshold. Tested with binding multiple FDs requesting 6 >>>>>>>> FIFOs. >>>>>>>> >>>>>>>> Changes in V2: >>>>>>>> - Modified TXFIFO resizing logic to ensure that each EP is >>>>>>>> reserved a >>>>>>>> FIFO. >>>>>>>> - Removed dev_dbg() prints and fixed typos from patches >>>>>>>> - Added some more description on the dt-bindings commit message >>>>>>>> >>>>>>>> Currently, there is no functionality to allow for resizing the >>>>>>>> TXFIFOs, and >>>>>>>> relying on the HW default setting for the TXFIFO depth. In most >>>>>>>> cases, the >>>>>>>> HW default is probably sufficient, but for USB compositions that >>>>>>>> contain >>>>>>>> multiple functions that require EP bursting, the default settings >>>>>>>> might not be enough. Also to note, the current SW will assign an >>>>>>>> EP to a >>>>>>>> function driver w/o checking to see if the TXFIFO size for that >>>>>>>> particular >>>>>>>> EP is large enough. (this is a problem if there are multiple HW >>>>>>>> defined >>>>>>>> values for the TXFIFO size) >>>>>>>> >>>>>>>> It is mentioned in the SNPS databook that a minimum of TX FIFO >>>>>>>> depth = 3 >>>>>>>> is required for an EP that supports bursting. Otherwise, there >>>>>>>> may be >>>>>>>> frequent occurences of bursts ending. For high bandwidth >>>>>>>> functions, >>>>>>>> such as data tethering (protocols that support data aggregation), >>>>>>>> mass >>>>>>>> storage, and media transfer protocol (over FFS), the bMaxBurst >>>>>>>> value >>>>>>>> can be >>>>>>>> large, and a bigger TXFIFO depth may prove to be beneficial in >>>>>>>> terms >>>>>>>> of USB >>>>>>>> throughput. (which can be associated to system access latency, >>>>>>>> etc...) It >>>>>>>> allows for a more consistent burst of traffic, w/o any >>>>>>>> interruptions, as >>>>>>>> data is readily available in the FIFO. >>>>>>>> >>>>>>>> With testing done using the mass storage function driver, the >>>>>>>> results >>>>>>>> show >>>>>>>> that with a larger TXFIFO depth, the bandwidth increased >>>>>>>> significantly. >>>>>>>> >>>>>>>> Test Parameters: >>>>>>>> - Platform: Qualcomm SM8150 >>>>>>>> - bMaxBurst = 6 >>>>>>>> - USB req size = 256kB >>>>>>>> - Num of USB reqs = 16 >>>>>>>> - USB Speed = Super-Speed >>>>>>>> - Function Driver: Mass Storage (w/ ramdisk) >>>>>>>> - Test Application: CrystalDiskMark >>>>>>>> >>>>>>>> Results: >>>>>>>> >>>>>>>> TXFIFO Depth = 3 max packets >>>>>>>> >>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>> ------------------------------------------- >>>>>>>> Sequential|1 GB x | >>>>>>>> Read |9 loops | 193.60 >>>>>>>> | | 195.86 >>>>>>>> | | 184.77 >>>>>>>> | | 193.60 >>>>>>>> ------------------------------------------- >>>>>>>> >>>>>>>> TXFIFO Depth = 6 max packets >>>>>>>> >>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>> ------------------------------------------- >>>>>>>> Sequential|1 GB x | >>>>>>>> Read |9 loops | 287.35 >>>>>>>> | | 304.94 >>>>>>>> | | 289.64 >>>>>>>> | | 293.61 >>>>>>>> ------------------------------------------- >>>>>>>> >>>>>>>> Wesley Cheng (6): >>>>>>>> usb: gadget: udc: core: Introduce check_config to verify USB >>>>>>>> configuration >>>>>>>> usb: gadget: configfs: Check USB configuration before adding >>>>>>>> usb: dwc3: Resize TX FIFOs to meet EP bursting requirements >>>>>>>> of: Add stub for of_add_property() >>>>>>>> usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by >>>>>>>> default >>>>>>>> dt-bindings: usb: dwc3: Update dwc3 TX fifo properties >>>>>>>> >>>>>>>> .../devicetree/bindings/usb/snps,dwc3.yaml | 15 +- >>>>>>>> drivers/usb/dwc3/core.c | 9 + >>>>>>>> drivers/usb/dwc3/core.h | 15 ++ >>>>>>>> drivers/usb/dwc3/dwc3-qcom.c | 9 + >>>>>>>> drivers/usb/dwc3/ep0.c | 2 + >>>>>>>> drivers/usb/dwc3/gadget.c | 212 >>>>>>>> +++++++++++++++++++++ >>>>>>>> drivers/usb/gadget/configfs.c | 22 +++ >>>>>>>> drivers/usb/gadget/udc/core.c | 25 +++ >>>>>>>> include/linux/of.h | 5 + >>>>>>>> include/linux/usb/gadget.h | 5 + >>>>>>>> 10 files changed, 317 insertions(+), 2 deletions(-) >>>>>>>> -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project