On Wed, 2014-04-02 at 15:46 +0200, Ortwin Gl?ck wrote: > $ strace -ttfp 29785 After manually editing that to undo the word-wrapping and make it readable again, it looks like the two large delays you see in select() are when it's waiting for data from the tunnel. I'm not sure it can make that happen faster. The problem here isn't the way the event loop works, AFAICT. > Question remains: why does openconnect's select block for so long? A lot > of data should be already available in the socket buffer. It's not the socket buffer we care about, from the openconnect point of view. It's packets being "sent" on the tunnel interface. And openconnect is sending those as soon as they get here... > Here you go. I have replaced the actual IPs with names for convenience > and privacy. Timestamps are seconds since previous packet. Note the > 0.214s delays before local starts sending again after last ACK. Hm, I was hoping for timestamps which would correlate with the ones in the strace run, but I can cope with this... > No. Time Source Destination Protocol Length Info > 1100 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 > 1101 0.000 SSL-local SSL-remote TLSv1 1126 Application Data > 1102 0.000 SSL-local SSL-remote TLSv1 328 Application Data A packet is 'sent' over the tun interface. OpenConnect immediately sends it. It takes *two* packets to send it, since your MTU inside the tunnel is larger than your MTU outside the tunnel. That's probably not going to help your performance. And neither is the fact that you appear not to be using DTLS. Nobody who cares about performance should be using TCP-over-TCP. Why aren't you using DTLS? > 1103 0.007 SSL-remote SSL-local TCP 60 443 > 54547 [ACK] Seq=19758 Ack=204341 Win=32768 Len=0 Fairly quickly, an ACK comes back from the VPN server for that data. > 1104 0.001 SSL-remote SSL-local TLSv1 147 Application Data > 1105 0.000 SSL-local SSL-remote TCP 54 54547 > 443 [ACK] Seq=204615 Ack=19851 Win=65535 Len=0 1ms later, a data packet comes back over the SSL connection, and an ACK is sent for it. > 1106 0.000 tun-remote tun-local TCP 52 22 > 37376 [ACK] Seq=6189 Ack=184836 Win=64128 Len=0 We feed the data packet (which is itself an ACK for the SSH connection) to the tun device with no noticeable delay. > 1107 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 There is a 214ms delay before the next packet is sent by SSH (or more to the point the TCP stack). The sequence repeats... > 1108 0.000 SSL-local SSL-remote TLSv1 1126 Application Data > 1109 0.000 SSL-local SSL-remote TLSv1 328 Application Data > 1110 0.007 SSL-remote SSL-local TCP 60 443 > 54547 [ACK] Seq=19851 Ack=205687 Win=32768 Len=0 > 1111 0.001 SSL-remote SSL-local TLSv1 147 Application Data > 1112 0.000 SSL-local SSL-remote TCP 54 54547 > 443 [ACK] Seq=205961 Ack=19944 Win=65535 Len=0 > 1113 0.000 tun-remote tun-local TCP 52 22 > 37376 [ACK] Seq=6189 Ack=186064 Win=64128 Len=0 > 1114 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 Why is it taking 7ms for a pair of TCP packets to reach your VPN server and then for an ACK to come back for them? I can't see the sequence numbers for your outgoing packets but I have strong suspicion that there's a fair amount of data in-flight at any given time. So it's not that the SSH connection sends a data packet, gets an ACK within 7ms and then gratuitously waits around for almost quarter of a second before sending any more data. I suspect it's got packets outstanding and the TCP stack has determined that this is the maximum rate it can send at. What's the TCP sendq on the offending SSH socket (and the SSL socket for that matter, but I suspect that'll be low) when this happens? (netstat -tn) -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5745 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/openconnect-devel/attachments/20140402/4b0fbe87/attachment.bin>