On 5/9/23 15:42, Thinh Nguyen wrote: > On Tue, May 09, 2023, Thinh Nguyen wrote: > > Just want to clarify that complete here is from the perspective of the > controller. The controller driver doesn't automatically update the > request and give back the request to the gadget driver yet. > > BR, > Thinh > Hey all, Sorry in advance for the long message. I have been running more tests to quantify if (and how) performance is affected with my patch. I wanted to quantify what the 8ms latency looks like, so I modified the gadget driver to track some stats. Specifically, I tracked the round trip time of usb_requests and the time between two consecutive frames being queued. TL;DR: Naively queueing up 0-length requests as fast as possible has the consequence of delaying the entire stream by ~8ms, but does not seem to be delaying frames individually any more than before. Changing the number of 0-length requests has very little consequence on the stream, but increases the chances of missed ISOCs. For usb_requests' RTTs, I tracked the time it takes between 'usb_ep_queue' and the controller's 'complete' callback to return the usb_request. This is affected by the 'no_interrupt' flag as the controller will call 'complete' callback in batches. Frame time is defined as the time elapsed between completely queueing two consecutive video frames. This is meant to give an idea of how long the frame sat around waiting for usb_requests to come back. These were collected in 4 settings: 1. Unpatched: As the name implies, ToT! 2. Burst: (Name borrowed from Laurent) Queue requests ASAP. This is the implementation in https://lore.kernel.org/20230508231103.1621375-1-arakesh@xxxxxxxxxx/. 3. Burst w/ more interrupts: Same as (2) but instead of interrupting every 16th 0-length request, interrupt every 8th request. The idea was to increase the gadget's driver responsiveness at the cost of extra interrupts. 4. Burst w/ limited 0-length packets: Same as Burst but only queue up at most 32 0-length requests. The idea was to reduce the number of empty usb_requests that a video frame could get stuck behind. Some other things to note before the logs: - All times are reported in ns - Samples were randomly collected during an ongoing 1080p @ 30fps stream - The number of allocated usb_requests was fixed at 64 for all tests. Okay, so here are the numbers: 1. Unpatched: ``` usb_request rtt : min/max/mean: 3860718 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3857951 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3854004 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3847575 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3841594 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3834066 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3426879 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3414632 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3408203 : 220744/5710327/3372587 usb_request rtt : min/max/mean: 3401489 : 220744/5710327/3372587 ``` ``` frame time: curr: min/max/mean: 33385376 : 15103638/60926513/33474315 frame time: curr: min/max/mean: 31438558 : 15103638/60926513/33472507 frame time: curr: min/max/mean: 31650391 : 15103638/60926513/33470890 frame time: curr: min/max/mean: 36657959 : 15103638/60926513/33473715 frame time: curr: min/max/mean: 31758260 : 15103638/60926513/33472195 frame time: curr: min/max/mean: 34915934 : 15103638/60926513/33473472 frame time: curr: min/max/mean: 34599447 : 15103638/60926513/33474467 frame time: curr: min/max/mean: 33658162 : 15103638/60926513/33474629 frame time: curr: min/max/mean: 44577637 : 15103638/60926513/33484428 ``` With no changes, the usb_requests spend ~3ms in flight, and there is one video frame dispatched every ~33ms. Unfortunately (or perhaps fortunately?), it seems like the system latencies add to >8ms of per frame jitter when encoding frames. 2. Burst: ``` usb_request rtt curr : min/max/mean: 7966878 : 1663778/9449341/7878007 usb_request rtt curr : min/max/mean: 7964966 : 1663778/9449341/7878008 usb_request rtt curr : min/max/mean: 7963582 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7920817 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7923503 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7916545 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7914021 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7914348 : 1663778/9449341/7878009 usb_request rtt curr : min/max/mean: 7914469 : 1663778/9449341/7878009 ``` ``` frame time: curr: min/max/mean: 31675130 : 14732381/121984294/33563117 frame time: curr: min/max/mean: 33192668 : 14732381/121984294/33563095 frame time: curr: min/max/mean: 35761963 : 14732381/121984294/33563223 frame time: curr: min/max/mean: 30776733 : 14732381/121984294/33563059 frame time: curr: min/max/mean: 31940471 : 14732381/121984294/33562964 frame time: curr: min/max/mean: 39079834 : 14732381/121984294/33563286 frame time: curr: min/max/mean: 30279500 : 14732381/121984294/33563093 frame time: curr: min/max/mean: 38558472 : 14732381/121984294/33563385 frame time: curr: min/max/mean: 34294149 : 14732381/121984294/33563427 frame time: curr: min/max/mean: 26993286 : 14732381/121984294/33563042 ``` As discussed in the thread above, we do in fact see usb_requests taking an average of 8ms to be returned to the gadget. This means that the driver is generally keeping up the back pressure to the controller, which is consistent with no missed ISOCs. Slightly more interesting is that the frames are still being dispatched at the same frequency as (1), albeit a fraction slower than (1). The jitters for individual frames are difficult to show here, but they seemed about the same as (1). IIUC, this means that the 8ms delay gets applied to the entire the entire stream once a steady state is reached. However, with system latencies already showing jitter >8ms, it is possible that a slight regression here is being covered up. 3. Burst w/ more interrupts ``` usb_request rtt curr : min/max/mean: 7982137 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7998698 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7998088 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7933024 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7948364 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7945476 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7968872 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7982219 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7983155 : 906942/11119832/7712539 usb_request rtt curr : min/max/mean: 7649048 : 906942/11119832/7712538 ``` ``` frame time: curr: min/max/mean: 40471069 : 21488363/46635051/33474386 frame time: curr: min/max/mean: 29503703 : 21488363/46635051/33473695 frame time: curr: min/max/mean: 27489502 : 21488363/46635051/33472654 frame time: curr: min/max/mean: 36647908 : 21488363/46635051/33473205 frame time: curr: min/max/mean: 34343506 : 21488363/46635051/33473356 frame time: curr: min/max/mean: 31686198 : 21488363/46635051/33473045 frame time: curr: min/max/mean: 34582601 : 21488363/46635051/33473237 frame time: curr: min/max/mean: 32652425 : 21488363/46635051/33473094 frame time: curr: min/max/mean: 31930054 : 21488363/46635051/33472826 frame time: curr: min/max/mean: 34652669 : 21488363/46635051/33473030 ``` There were no meaningful differences in usb_request RTTs from (2), which is expected, as the driver can still comfortably maintain the back pressure. However, we do see a very slight increase in responsiveness as the frames are queued at approximately the same frequency as (1). However, the per frame jitter is consistent with (1) and (2), so this could just be a sampling issue. 4. Burst w/ limited 0-length packets: ``` usb_request rtt curr : min/max/mean: 6591879 : 644246/7986613/4202894 usb_request rtt curr : min/max/mean: 6585978 : 644246/7986613/4202929 usb_request rtt curr : min/max/mean: 6582805 : 644246/7986613/4202964 usb_request rtt curr : min/max/mean: 3980957 : 644246/7986613/4202960 usb_request rtt curr : min/max/mean: 4000162 : 644246/7986613/4202956 usb_request rtt curr : min/max/mean: 3995484 : 644246/7986613/4202952 usb_request rtt curr : min/max/mean: 3989380 : 644246/7986613/4202948 usb_request rtt curr : min/max/mean: 3998942 : 644246/7986613/4202944 usb_request rtt curr : min/max/mean: 3998779 : 644246/7986613/4202940 ``` ``` frame time: curr: min/max/mean: 35985392 : 21889445/42860636/33474731 frame time: curr: min/max/mean: 34428670 : 21889445/42860636/33476020 frame time: curr: min/max/mean: 35499146 : 21889445/42860636/33478750 frame time: curr: min/max/mean: 40315307 : 21889445/42860636/33487963 frame time: curr: min/max/mean: 24934611 : 21889445/42860636/33476451 frame time: curr: min/max/mean: 32321005 : 21889445/42860636/33474897 frame time: curr: min/max/mean: 33268799 : 21889445/42860636/33474620 frame time: curr: min/max/mean: 37001586 : 21889445/42860636/33479347 frame time: curr: min/max/mean: 30968384 : 21889445/42860636/33475985 ``` This showed the most jitter in usb_request RTTs, likely because controller was frequently underrun or on the verge of underrunning. So while the requests were delayed by an average of 4ms, there was significant deviation per packet. The video frames were still encoded at a stable frequency, but there were still flickers from when ISOC requests would eventually underrun. In conclusion, it seems like the naive approach (2) has similar characteristics to the current version (1), albeit with an 8ms stream latency, and optimizing by means of changing interrupt frequency or number of requests result is minor, if any, performance increases. - Avi.