On 02.04.2014 13:14, David Woodhouse wrote: > On Wed, 2014-04-02 at 12:55 +0200, Ortwin Gl?ck wrote: >> Also strace shows that the write() by scp returns EAGAIN very often even >> though the selector gets a signal that the fd is ready for writing: > > Thanks for the report. > > It would be useful if you were to strace openconnect itself, and see > precisely what it's doing. Here are its fds: total 0 lr-x------ 1 root root 64 Apr 2 15:04 0 -> /tmp/sh-thd-1396418206 (deleted) lrwx------ 1 root root 64 Apr 2 15:04 1 -> /dev/pts/1 lrwx------ 1 root root 64 Apr 2 10:38 2 -> /dev/pts/1 lr-x------ 1 root root 64 Apr 2 15:04 3 -> pipe:[61426] l-wx------ 1 root root 64 Apr 2 15:04 4 -> pipe:[61426] lrwx------ 1 root root 64 Apr 2 15:04 5 -> socket:[61427] lrwx------ 1 root root 64 Apr 2 15:04 6 -> socket:[64637] lrwx------ 1 root root 64 Apr 2 15:04 7 -> /dev/net/tun 6: https connection 7: tun $ strace -ttfp 29785 15:07:32.901346 select(8, [3 6 7], [], [6], {29, 999616}) = 1 (in [7], left {29, 781601}) -> data avail in tun -> loop starts here: 15:07:33.119660 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) -> unnecessary read from SSL, no data to write to SSL 15:07:33.119787 read(7, "E\10\5\0b\340@\0@\6\17\201\n\363X\236\n\271UE\222:\0\26\37O\3\26\315\21q\33"..., 1280) = 1280 15:07:33.119882 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource temporarily unavailable) -> tun data read, no data to write 15:07:33.119999 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) -> poll control fd 15:07:33.120106 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) -> unnecessary read from SSL 15:07:33.120370 write(6, "\27\3\1\0\30i\332\2\352j\203\262'\202\233;k\223\37/\1|b\314\247\202\313\234E\27\3\1"..., 1346) = 1346 -> write to SSL 15:07:33.120570 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource temporarily unavailable) -> tries to read tun again, no write 15:07:33.120651 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) -> poll control fd 15:07:33.120741 select(8, [3 6 7], [], [6], {29, 0}) = 1 (in [6], left {28, 990785}) -> data avail in SSL 15:07:33.130092 read(6, "\27\3\1\0X", 5) = 5 15:07:33.130181 read(6, "\312\r\246\304\32\317N\16\371\315;\373k\320\205\203\266v\207\342l\216\270T\302\314vW\362L\22\0"..., 88) = 88 15:07:33.130313 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) -> data read from SSL 15:07:33.130385 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource temporarily unavailable) -> no data in tun 15:07:33.130439 write(7, "E\10\0004\245\273@\0:\6\327q\n\271UE\n\363X\236\0\26\222:\315\21q\33\37O\7\342"..., 52) = 52 -> write data to tun 15:07:33.130526 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) -> poll control fd 15:07:33.130585 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.130640 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.130688 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) 15:07:33.130745 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [7], left {29, 783272}) ===> HUGE of 217ms delay here 15:07:33.347681 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.347806 read(7, "E\10\5\0b\343@\0@\6\17~\n\363X\236\n\271UE\222:\0\26\37O\7\342\315\21q\33"..., 1280) = 1280 15:07:33.347954 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.348041 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) 15:07:33.348134 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.348378 write(6, "\27\3\1\0\30R\351^\226\332Y\n*\300\r\354Oc\334kb%\205(\223gB\257\260\27\3\1"..., 1346) = 1346 15:07:33.348600 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.348708 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) 15:07:33.348796 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [6], left {29, 991582}) 15:07:33.357292 read(6, "\27\3\1\0X", 5) = 5 15:07:33.357329 read(6, "\3\263\36V\220$u\6N\352\0203\312\323\330\r+\27.g\5\256\20\0\341\325E*\316Khc"..., 88) = 88 15:07:33.357390 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.357425 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.357453 write(7, "E\10\0004\245\274@\0:\6\327p\n\271UE\n\363X\236\0\26\222:\315\21q\33\37O\f\256"..., 52) = 52 15:07:33.357508 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) 15:07:33.357543 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.357575 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource temporarily unavailable) 15:07:33.357602 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) 15:07:33.357634 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [7], left {29, 782110}) ==> HUGE delay of 218ms here 15:07:33.575758 read(6, 0x1dfc973, 5) = -1 EAGAIN (Resource temporarily unavailable) So it looks like openconnect is just waiting for packets to become available on the socket. I have straced the scp/ssh process pair better (scp just writes to a pipe to ssh). ssh's socket is its fd 3: 15:36:01.141172 select(7, [3], [3], NULL, NULL) = 1 (out [3]) 15:36:01.533392 write(3, "\t\302\273\311\25F\253\361\321\335^9\241]\v\212>1\203\23\300\313(\325\267\331~\360\216:\235\22"..., 162620) = 14736 15:36:01.533536 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0 15:36:01.533640 select(7, [3], [3], NULL, NULL) = 1 (out [3]) 15:36:01.569605 write(3, "\323\237l\"\221E\341\362\217\366\200\354[\314a\277\364Q\300\177\4\200Ty\217\33r\26PJP\354"..., 147884) = 17192 15:36:01.569988 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0 15:36:01.570081 select(7, [3 4], [3], NULL, NULL) = 1 (in [4]) 15:36:01.570201 read(4, "\367\1774\262t\246\330\310\2777A^\261|B\320\203\361\375@\365\345\351~\263\23\364\207\232@\373\311"..., 16384) = 16384 15:36:01.570372 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0 15:36:01.570467 select(7, [3 4], [3], NULL, NULL) = 1 (in [4]) 15:36:01.570596 read(4, "\251\254\223\363\274G\233\270\306D\367\36\177\231\256G\345\273\214\2371\177e\350\357\7\313HO\353\355\322"..., 16384) = 16384 15:36:01.570740 mremap(0x7f7077b41000, 856064, 888832, MREMAP_MAYMOVE) = 0x7f7077b41000 15:36:01.570796 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0 15:36:01.570845 select(7, [3], [3], NULL, NULL) = 1 (out [3]) 15:36:03.485027 write(3, "\v\374\315nl\225\371\350\2065\253\3326b\27\375\27\2\217[P\217\231s\263I~Q\321VW\237"..., 163508) = 14736 15:36:03.485214 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0 15:36:03.485314 select(7, [3], [3], NULL, NULL) = 1 (out [3]) Seems to behave like it should with several seconds between select/write and writes in chunks of 14KB. Question remains: why does openconnect's select block for so long? A lot of data should be already available in the socket buffer. > If you could show the tcpdump both from the > tunnel interface *and* the private network between client and server > while this is happening, it would also be enlightening. Here you go. I have replaced the actual IPs with names for convenience and privacy. Timestamps are seconds since previous packet. Note the 0.214s delays before local starts sending again after last ACK. No. Time Source Destination Protocol Length Info 1100 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 1101 0.000 SSL-local SSL-remote TLSv1 1126 Application Data 1102 0.000 SSL-local SSL-remote TLSv1 328 Application Data 1103 0.007 SSL-remote SSL-local TCP 60 443 > 54547 [ACK] Seq=19758 Ack=204341 Win=32768 Len=0 1104 0.001 SSL-remote SSL-local TLSv1 147 Application Data 1105 0.000 SSL-local SSL-remote TCP 54 54547 > 443 [ACK] Seq=204615 Ack=19851 Win=65535 Len=0 1106 0.000 tun-remote tun-local TCP 52 22 > 37376 [ACK] Seq=6189 Ack=184836 Win=64128 Len=0 TSval=3122435631 TSecr=13382833 1107 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 1108 0.000 SSL-local SSL-remote TLSv1 1126 Application Data 1109 0.000 SSL-local SSL-remote TLSv1 328 Application Data 1110 0.007 SSL-remote SSL-local TCP 60 443 > 54547 [ACK] Seq=19851 Ack=205687 Win=32768 Len=0 1111 0.001 SSL-remote SSL-local TLSv1 147 Application Data 1112 0.000 SSL-local SSL-remote TCP 54 54547 > 443 [ACK] Seq=205961 Ack=19944 Win=65535 Len=0 1113 0.000 tun-remote tun-local TCP 52 22 > 37376 [ACK] Seq=6189 Ack=186064 Win=64128 Len=0 TSval=3122435855 TSecr=13382889 1114 0.214 tun-local tun-remote SSHv2 1280 Encrypted request packet len=1228 > But I don't think it's the short-term solution to > your problem. Seems right. Ortwin