On 6/22/2021 3:12 PM, Ferry Toth wrote: > Hi > > Op 23-06-2021 om 00:00 schreef Wesley Cheng: >> >> On 6/22/2021 1:09 PM, Ferry Toth wrote: >>> Hi >>> >>> Op 22-06-2021 om 20:38 schreef Wesley Cheng: >>>> On 6/19/2021 5:40 AM, Ferry Toth wrote: >>>>> Hi >>>>> >>>>> Op 18-06-2021 om 00:25 schreef Wesley Cheng: >>>>>> Hi, >>>>>> >>>>>> On 6/17/2021 2:55 PM, Ferry Toth wrote: >>>>>>> Hi >>>>>>> >>>>>>> Op 17-06-2021 om 23:48 schreef Wesley Cheng: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 6/17/2021 2:01 PM, Ferry Toth wrote: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Op 17-06-2021 om 11:58 schreef Wesley Cheng: >>>>>>>>>> Changes in V10: >>>>>>>>>> - Fixed compilation errors in config where OF is not used >>>>>>>>>> (error due to >>>>>>>>>> unknown symbol for of_add_property()). Add >>>>>>>>>> of_add_property() >>>>>>>>>> stub. >>>>>>>>>> - Fixed compilation warning for incorrect argument being >>>>>>>>>> passed to >>>>>>>>>> dwc3_mdwidth >>>>>>>>> This fixes the OOPS I had in V9. I do not see any change in >>>>>>>>> performance >>>>>>>>> on Merrifield though. >>>>>>>> I see...thanks Ferry! With your testing, are you writing to the >>>>>>>> device's >>>>>>>> internal storage (ie UFS, eMMC, etc...), or did you use a >>>>>>>> ramdisk as >>>>>>>> well? >>>>>>> In this case I just tested the EEM path using iperf3. >>>>>>> >>>>>> Got it. I don't believe f_eem will use a high enough (if at all) >>>>>> bMaxBurst value to change the TXFIFO size. >>>>>> >>>>>>>> If not with a ramdisk, we might want to give that a try to avoid >>>>>>>> the >>>>>>>> storage path being the bottleneck. You can use "dd" to create an >>>>>>>> empty >>>>>>>> file, and then just use that as the LUN's backing file. >>>>>>>> >>>>>>>> echo ramdisk.img > >>>>>>>> /sys/kernel/config/usb_gadget/g1/functions/mass_storage.0/lun.0/file >>>>>>>> >>>>>>> Ah, why didn't I think of that. I have currently mass storage setup >>>>>>> with >>>>>>> eMMC but it seems that is indeed the bottleneck. >>>>>>> >>>>> I created a 64MB disk following the instructions here >>>>> http://www.linux-usb.org/gadget/file_storage.html (that seems a little >>>>> outdated, at least I can not start the first partition at sector 8, >>>>> but >>>>> minimum 2048), and added a test file on it. >>>>> >>>>> I then copy the file to /dev/shm prior to setting configfs (composite >>>>> device gser/eem/mass_storage/uac2). >>>>> >>>>> journal shows: >>>>> >>>>> kernel: Mass Storage Function, version: 2009/09/11 >>>>> kernel: LUN: removable file: (no medium) >>>>> >>>>> I don't know what that means, because I see the test file on the >>>>> ramdisk. >>>>> >>>>> Then I again used gnome disks to benchmark (read/write 10MB): >>>>> >>>>> With V10 on top v5.13.0-rc5: >>>>> >>>>> R/W speed = 35.6/35.8MB/s, access time 0.35ms >>>>> >>>>> With no patches on top v5.12.0: >>>>> >>>>> R/W speed = 35.7/36.1MB/s, access time 0.35ms >>>> Hi Ferry, >>>> >>>>> I see no speed difference (and it's about the same as with the eMMC >>>>> backed disk). But the patches are causing a new call trace >>>>> >>>> Would you happen to know what DWC3 controller revision the device is >>>> using? The callstack print occurs, because it looks like it ran out of >>>> internal memory, although there should be logic present for making sure >>>> that at least there is enough room for 1 FIFO per endpoint. (possibly >>>> the logic/math depends on the controller revision) >> Hi Ferry, >> >>> Do you know where I could find that in a file on the device? >>> >> Maybe you can just dump the DWC3 registers? >> cat /sys/kernel/debug/usb/<controller name>/regdump > > This isin host mode: > > ~# cat /sys/kernel/debug/usb/dwc3.0.auto/regdump > GSBUSCFG0 = 0x00000006 > GSBUSCFG1 = 0x00000f00 > GTXTHRCFG = 0x230a0000 > GRXTHRCFG = 0x22800000 > GCTL = 0x45801002 > GEVTEN = 0x00000000 > GSTS = 0x3e800001 > GUCTL1 = 0x00000000 > GSNPSID = 0x5533250a > GGPIO = 0x00000000 > GUID = 0x00050d00 > GUCTL = 0x0200ce00 > GBUSERRADDR0 = 0x00000000 > GBUSERRADDR1 = 0x00000000 ... > Hi Ferry, Great! Thanks, that got me the information regarding the DWC3 revision you're using. (its version 2.50a) I checked the DWC3 databook and looks like certain versions have some different math to calculate the TXFIFO size. I have taken this into account and factored that in in the next revision. Is there a way you could collect the regdump in device mode w/ mass storage only composition after enumerating with the PC? That would help confirm how the TXFIFO resizing logic is working on your platform. Thanks Wesley Cheng >> Was going to ask for the same to confirm the TXFIFO sizes from your >> results below :). >> >>> Otherwise, I'm hoping Andy will know? >>> >>>> Also, is there a way to use just a mass storage only composition? >>>> Based >>>> on the above observation, that probably means that the mass storage >>>> interface wasn't resized at all, because the configuration took up a >>>> lot >>>> of the internal FIFO space. >>> Sure, it's configured through configfs. With only mass_storage I have: >>> >>> With V10 on top v5.13.0-rc5: >>> >>> R/W speed = 41,6/39,3MB/s, access time 0.33ms >>> >>> With no patches on top v5.12.0: >>> >>> R/W speed = 41,1/38,7MB/s, access time 0.38ms >>> >> Thanks Ferry! Could you collect the regdump, so I can confirm the two >> things mentioned? >> >> Thanks >> Wesley Cheng >> >>>> Thanks >>>> Wesley Cheng >>>> >>>>> kernel: using random self ethernet address >>>>> kernel: using random host ethernet address >>>>> kernel: Mass Storage Function, version: 2009/09/11 >>>>> kernel: LUN: removable file: (no medium) >>>>> kernel: usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>>>> kernel: usb0: MAC aa:bb:cc:dd:ee:f1 >>>>> kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>>>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>>>> depth:115540359 >>>>> kernel: ------------[ cut here ]------------ >>>>> kernel: WARNING: CPU: 0 PID: 594 at drivers/usb/gadget/udc/core.c:278 >>>>> usb_ep_queue+0x75/0x80 >>>>> kernel: Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage >>>>> usb_f_eem u_ether usb_f_serial u_serial libcomposite rfcomm >>>>> iptable_nat >>>>> bnep snd_sof_nocodec spi_pxa2> >>>>> kernel: CPU: 0 PID: 594 Comm: irq/14-dwc3 Not tainted >>>>> 5.13.0-rc5-edison-acpi-standard #1 >>>>> kernel: Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>>>> BIOS 542 >>>>> 2015.01.21:18.19.48 >>>>> kernel: RIP: 0010:usb_ep_queue+0x75/0x80 >>>>> kernel: Code: 01 73 e4 48 8b 05 fb 63 06 01 48 85 c0 74 12 48 8b 78 08 >>>>> 44 89 e9 4c 89 e2 48 89 ee e8 74 05 00 00 44 89 e8 5d 41 5c 41 5d c3 >>>>> <0f> 0b 41 bd 94 ff ff ff > >>>>> kernel: RSP: 0000:ffff91eec083fc98 EFLAGS: 00010082 >>>>> kernel: RAX: ffff8af20357d960 RBX: 0000000000000000 RCX: >>>>> ffff8af202f06400 >>>>> kernel: RDX: 0000000000000a20 RSI: ffff8af208785780 RDI: >>>>> ffff8af202e9ae00 >>>>> kernel: RBP: ffff8af202e9ae00 R08: 00000000000000c0 R09: >>>>> ffff8af208785780 >>>>> kernel: R10: 00000000ffffe000 R11: 3fffffffffffffff R12: >>>>> ffff8af208785780 >>>>> kernel: R13: 0000000000000000 R14: ffff8af202e9ae00 R15: >>>>> ffff8af203e26cc0 >>>>> kernel: FS: 0000000000000000(0000) GS:ffff8af23e200000(0000) >>>>> knlGS:0000000000000000 >>>>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> kernel: CR2: 000055e2c21f2100 CR3: 0000000003b38000 CR4: >>>>> 00000000001006f0 >>>>> kernel: Call Trace: >>>>> kernel: u_audio_start_playback+0x107/0x1a0 [u_audio] >>>>> kernel: composite_setup+0x224/0x1ba0 [libcomposite] >>>>> kernel: ? dwc3_gadget_ep_queue+0xf6/0x1a0 >>>>> kernel: ? usb_ep_queue+0x2a/0x80 >>>>> kernel: ? configfs_composite_setup+0x6b/0x90 [libcomposite] >>>>> kernel: configfs_composite_setup+0x6b/0x90 [libcomposite] >>>>> kernel: dwc3_ep0_interrupt+0x469/0xa80 >>>>> kernel: dwc3_thread_interrupt+0x8ee/0xf40 >>>>> kernel: ? __wake_up_common_lock+0x85/0xb0 >>>>> kernel: ? disable_irq_nosync+0x10/0x10 >>>>> kernel: irq_thread_fn+0x1b/0x60 >>>>> kernel: irq_thread+0xd6/0x170 >>>>> kernel: ? irq_thread_check_affinity+0x70/0x70 >>>>> kernel: ? irq_forced_thread_fn+0x70/0x70 >>>>> kernel: kthread+0x116/0x130 >>>>> kernel: ? kthread_create_worker_on_cpu+0x60/0x60 >>>>> kernel: ret_from_fork+0x22/0x30 >>>>> kernel: ---[ end trace e5b9e28058c53584 ]--- >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: dwc3 dwc3.0.auto: request 000000003c32dcc5 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>>>> depth:115540359 >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: request 00000000036ac129 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>>>> depth:115540359 >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: dwc3 dwc3.0.auto: request 00000000ad1b8c18 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: request 00000000fbc71244 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>>>> depth:115540359 >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: dwc3 dwc3.0.auto: request 00000000fbc71244 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: request 00000000ad1b8c18 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: Fifosize(2154) > RAM size(2022) ep5in >>>>> depth:115540359 >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: configfs-gadget gadget: u_audio_start_playback:451 Error! >>>>> kernel: dwc3 dwc3.0.auto: request 000000003c32dcc5 was not queued to >>>>> ep5in >>>>> kernel: dwc3 dwc3.0.auto: request 00000000b2512aa9 was not queued to >>>>> ep5in >>>>> >>>>> Removing uac2 from the config makes the call trace go away, but the >>>>> R/W >>>>> speed does not change. >>>>> >>>>>> :), not a problem...I've been working on getting the ideal set up for >>>>>> the performance profiling for awhile, so anything I can do to make >>>>>> sure >>>>>> we get some good results. >>>>>> >>>>>>> I'll try with a ramdisk and let you know. >>>>>>> >>>>>> Thanks again for the testing, Ferry. >>>>>> >>>>>> Thanks >>>>>> Wesley Cheng >>>>>> >>>>>>>> Thanks >>>>>>>> Wesley Cheng >>>>>>>> >>>>>>>>>> Changes in V9: >>>>>>>>>> - Fixed incorrect patch in series. Removed changes in >>>>>>>>>> DTSI, as >>>>>>>>>> dwc3-qcom will >>>>>>>>>> add the property by default from the kernel. >>>>>>>>>> >>>>>>>>>> Changes in V8: >>>>>>>>>> - Rebased to usb-testing >>>>>>>>>> - Using devm_kzalloc for adding txfifo property in dwc3-qcom >>>>>>>>>> - Removed DWC3 QCOM ACPI property for enabling the txfifo >>>>>>>>>> resize >>>>>>>>>> >>>>>>>>>> Changes in V7: >>>>>>>>>> - Added a new property tx-fifo-max-num for limiting how much >>>>>>>>>> fifo >>>>>>>>>> space the >>>>>>>>>> resizing logic can allocate for endpoints with large burst >>>>>>>>>> values. This >>>>>>>>>> can differ across platforms, and tie in closely with >>>>>>>>>> overall >>>>>>>>>> system latency. >>>>>>>>>> - Added recommended checks for DWC32. >>>>>>>>>> - Added changes to set the tx-fifo-resize property from >>>>>>>>>> dwc3-qcom by >>>>>>>>>> default >>>>>>>>>> instead of modifying the current DTSI files. >>>>>>>>>> - Added comments on all APIs/variables introduced. >>>>>>>>>> - Updated the DWC3 YAML to include a better description >>>>>>>>>> of the >>>>>>>>>> tx-fifo-resize >>>>>>>>>> property and added an entry for tx-fifo-max-num. >>>>>>>>>> >>>>>>>>>> Changes in V6: >>>>>>>>>> - Rebased patches to usb-testing. >>>>>>>>>> - Renamed to PATCH series instead of RFC. >>>>>>>>>> - Checking for fs_descriptors instead of ss_descriptors for >>>>>>>>>> determining the >>>>>>>>>> endpoint count for a particular configuration. >>>>>>>>>> - Re-ordered patch series to fix patch dependencies. >>>>>>>>>> >>>>>>>>>> Changes in V5: >>>>>>>>>> - Added check_config() logic, which is used to >>>>>>>>>> communicate the >>>>>>>>>> number of EPs >>>>>>>>>> used in a particular configuration. Based on this, the >>>>>>>>>> DWC3 >>>>>>>>>> gadget driver >>>>>>>>>> has the ability to know the maximum number of eps >>>>>>>>>> utilized in >>>>>>>>>> all >>>>>>>>>> configs. >>>>>>>>>> This helps reduce unnecessary allocation to unused eps, >>>>>>>>>> and will >>>>>>>>>> catch fifo >>>>>>>>>> allocation issues at bind() time. >>>>>>>>>> - Fixed variable declaration to single line per variable, >>>>>>>>>> and >>>>>>>>>> reverse xmas. >>>>>>>>>> - Created a helper for fifo clearing, which is used by ep0.c >>>>>>>>>> >>>>>>>>>> Changes in V4: >>>>>>>>>> - Removed struct dwc3* as an argument for >>>>>>>>>> dwc3_gadget_resize_tx_fifos() >>>>>>>>>> - Removed WARN_ON(1) in case we run out of fifo space >>>>>>>>>> Changes in V3: >>>>>>>>>> - Removed "Reviewed-by" tags >>>>>>>>>> - Renamed series back to RFC >>>>>>>>>> - Modified logic to ensure that fifo_size is reset if we >>>>>>>>>> pass the >>>>>>>>>> minimum >>>>>>>>>> threshold. Tested with binding multiple FDs requesting 6 >>>>>>>>>> FIFOs. >>>>>>>>>> >>>>>>>>>> Changes in V2: >>>>>>>>>> - Modified TXFIFO resizing logic to ensure that each EP is >>>>>>>>>> reserved a >>>>>>>>>> FIFO. >>>>>>>>>> - Removed dev_dbg() prints and fixed typos from patches >>>>>>>>>> - Added some more description on the dt-bindings commit >>>>>>>>>> message >>>>>>>>>> >>>>>>>>>> Currently, there is no functionality to allow for resizing the >>>>>>>>>> TXFIFOs, and >>>>>>>>>> relying on the HW default setting for the TXFIFO depth. In most >>>>>>>>>> cases, the >>>>>>>>>> HW default is probably sufficient, but for USB compositions that >>>>>>>>>> contain >>>>>>>>>> multiple functions that require EP bursting, the default settings >>>>>>>>>> might not be enough. Also to note, the current SW will assign an >>>>>>>>>> EP to a >>>>>>>>>> function driver w/o checking to see if the TXFIFO size for that >>>>>>>>>> particular >>>>>>>>>> EP is large enough. (this is a problem if there are multiple HW >>>>>>>>>> defined >>>>>>>>>> values for the TXFIFO size) >>>>>>>>>> >>>>>>>>>> It is mentioned in the SNPS databook that a minimum of TX FIFO >>>>>>>>>> depth = 3 >>>>>>>>>> is required for an EP that supports bursting. Otherwise, there >>>>>>>>>> may be >>>>>>>>>> frequent occurences of bursts ending. For high bandwidth >>>>>>>>>> functions, >>>>>>>>>> such as data tethering (protocols that support data aggregation), >>>>>>>>>> mass >>>>>>>>>> storage, and media transfer protocol (over FFS), the bMaxBurst >>>>>>>>>> value >>>>>>>>>> can be >>>>>>>>>> large, and a bigger TXFIFO depth may prove to be beneficial in >>>>>>>>>> terms >>>>>>>>>> of USB >>>>>>>>>> throughput. (which can be associated to system access latency, >>>>>>>>>> etc...) It >>>>>>>>>> allows for a more consistent burst of traffic, w/o any >>>>>>>>>> interruptions, as >>>>>>>>>> data is readily available in the FIFO. >>>>>>>>>> >>>>>>>>>> With testing done using the mass storage function driver, the >>>>>>>>>> results >>>>>>>>>> show >>>>>>>>>> that with a larger TXFIFO depth, the bandwidth increased >>>>>>>>>> significantly. >>>>>>>>>> >>>>>>>>>> Test Parameters: >>>>>>>>>> - Platform: Qualcomm SM8150 >>>>>>>>>> - bMaxBurst = 6 >>>>>>>>>> - USB req size = 256kB >>>>>>>>>> - Num of USB reqs = 16 >>>>>>>>>> - USB Speed = Super-Speed >>>>>>>>>> - Function Driver: Mass Storage (w/ ramdisk) >>>>>>>>>> - Test Application: CrystalDiskMark >>>>>>>>>> >>>>>>>>>> Results: >>>>>>>>>> >>>>>>>>>> TXFIFO Depth = 3 max packets >>>>>>>>>> >>>>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>>>> ------------------------------------------- >>>>>>>>>> Sequential|1 GB x | >>>>>>>>>> Read |9 loops | 193.60 >>>>>>>>>> | | 195.86 >>>>>>>>>> | | 184.77 >>>>>>>>>> | | 193.60 >>>>>>>>>> ------------------------------------------- >>>>>>>>>> >>>>>>>>>> TXFIFO Depth = 6 max packets >>>>>>>>>> >>>>>>>>>> Test Case | Data Size | AVG tput (in MB/s) >>>>>>>>>> ------------------------------------------- >>>>>>>>>> Sequential|1 GB x | >>>>>>>>>> Read |9 loops | 287.35 >>>>>>>>>> | | 304.94 >>>>>>>>>> | | 289.64 >>>>>>>>>> | | 293.61 >>>>>>>>>> ------------------------------------------- >>>>>>>>>> >>>>>>>>>> Wesley Cheng (6): >>>>>>>>>> usb: gadget: udc: core: Introduce check_config to verify >>>>>>>>>> USB >>>>>>>>>> configuration >>>>>>>>>> usb: gadget: configfs: Check USB configuration before >>>>>>>>>> adding >>>>>>>>>> usb: dwc3: Resize TX FIFOs to meet EP bursting requirements >>>>>>>>>> of: Add stub for of_add_property() >>>>>>>>>> usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by >>>>>>>>>> default >>>>>>>>>> dt-bindings: usb: dwc3: Update dwc3 TX fifo properties >>>>>>>>>> >>>>>>>>>> .../devicetree/bindings/usb/snps,dwc3.yaml | 15 +- >>>>>>>>>> drivers/usb/dwc3/core.c | 9 + >>>>>>>>>> drivers/usb/dwc3/core.h | 15 ++ >>>>>>>>>> drivers/usb/dwc3/dwc3-qcom.c | 9 + >>>>>>>>>> drivers/usb/dwc3/ep0.c | 2 + >>>>>>>>>> drivers/usb/dwc3/gadget.c | 212 >>>>>>>>>> +++++++++++++++++++++ >>>>>>>>>> drivers/usb/gadget/configfs.c | 22 +++ >>>>>>>>>> drivers/usb/gadget/udc/core.c | 25 +++ >>>>>>>>>> include/linux/of.h | 5 + >>>>>>>>>> include/linux/usb/gadget.h | 5 + >>>>>>>>>> 10 files changed, 317 insertions(+), 2 deletions(-) >>>>>>>>>> -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project