Hello Oliver and Maciej, On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > Can you tell us which side in your test case produces many small packets? > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > f_ncm.c by an order of magnitude. The device side, Apalis iMX6, is the one producing the small packages and sending to my host PC. The VNC server is running on iMX6 device and, the client, on my host PC. I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing changed, the behaviour is the same. On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > That suggests, but does not prove that the issue is on the host side. > Could you post the result of "ethtool -S" after a test run? We should > get statistics on the reasons for transmissions that way. Sure, for some reason I couldn't run ethtool on the iMX6 device: / # ethtool -S usb0 no stats available So I ran everything on my Debian host PC. First, without any changes on the device's kernel, this is the result (where the VNC is really slow/frozen): $ sudo ethtool -S enx3e5dcdead75e NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 222 tx_reason_max_datagram: 0 tx_overhead: 42387 tx_ntbs: 222 rx_overhead: 38375 rx_ntbs: 529 Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to 8192. The performance improved a bit, and this is the result: $ sudo ethtool -S enx42ff68c1000a NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 321 tx_reason_max_datagram: 0 tx_overhead: 61617 tx_ntbs: 321 rx_overhead: 59341 rx_ntbs: 759 Finally, I changed from 8192 to 4096, and the perfomance was better: $ sudo ethtool -S enx3a601e306de1 NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 56067 tx_reason_max_datagram: 0 tx_overhead: 83630876 tx_ntbs: 56064 rx_overhead: 25437595 rx_ntbs: 847920 At 4096 I can use the VNC with my app, click on buttons and see the mouse moving smoothly. Please note the device name changes because we're using random MAC addresses. 'ethtool' was running on my Debian host PC. I tested for 1min30s and then got the statics with ethtool for all 3 tests. On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > An every 1s (the default) ping is too rare to be of help I'd assume... > Try ping with various intervals (-i). All the way down to a flood ping (-f). > Most likely -i 0.01 would be enough to make things work better... This one is interesting, backing to the default value of 16384, I launched the VNC client which now is back to be slow/frozen. In parallel, I started the ping command. The first one with 1 second results the following (I'm pingging my host PC, using my device's terminal): / # ping 192.168.11.2 -i 1 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms ^C --- 192.168.11.2 ping statistics --- 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 Ping is really slow and lost almost all packets. Next test, I decreased to 0.1s: / # ping 192.168.11.2 -i 0.1 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 While ping is running in parallel, VNC has a better performance, I can see my mouse running and click on some buttons. As soon as ping stops, VNC is slow/frozen again. Also we have less packet loss. Next test, decreased to 0.01: / # ping 192.168.11.2 -i 0.01 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 And finally, the flood: / # ping 192.168.11.2 -f PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma 12.413/7.802 ms While the flood is happening in parallel, the VNC runs very smoothly, and, again, as soon as it stops, it's back to slow/frozen. I believe here the ping command is helping to fullfil the buffer, that's why running it on parallel makes the VNC work... > Also which specific versions of the kernel are involved on both sides > of the connection. Device iMX6 is running Linux kernel v6.1.65, while my host PC is running Linux kernel v6.5.0. > There was a pretty recent fix related to packet aggregation recently > that could be either the fix or the cause. > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > Though I doubt it - I believe that was specific to how windows packs things. > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > gadget: ncm: Add support to configure wMaxSegmentSize" > that could be of use - though again, doubt it. I could also try to apply these patches and check how it goes. Thanks for the information. > Another thing that comes to mind, is that perhaps the device in > question does not have sufficiently high res timers? > There might be something in the kernel boot log / dmesg about hrtimer > resolution... > Maybe this just needs to be configurable... Or pick a smaller value > with broken hrtimer (if that's the issue), > or just disable aggregation if lowres hrtimers... etc... > > #define TX_TIMEOUT_NSECS 300000 > 300 us is too small to be noticeable by VNC imho, so I think something > *must* be misbehaving. > Perhaps timer resolution is bad and this 300us ends up being much larger...??? This is what I got from dmesg inside the iMX6 device: / # dmesg | grep timer [ 0.000000] Switching to timer-based delay loop, resolution 333ns [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 637086815595 ns [ 0.001545] Calibrating delay loop (skipped), value calculated using timer frequency.. 6.00 BogoMIPS (lpj=30000) [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > instead of CLOCK_MONOTONIC. > There could potentially be an issue with suspend, though I really doubt it. Also tested this, but it didn't change anything, VNC is still slow/frozen. > Another idea would be to add a gadget setting to disable tx > aggregation entirely... > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > Maybe there's some funkiness with page sizes?? Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I reduced the value, it got better and better, VNC running smoothly. I will keep testing to see if I get something. Thanks and regards, Hiago.