On Mon, Mar 25, 2019 at 11:54 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > > On Mon, 2019-03-25 at 11:41 +0200, Daniel Lenski wrote: > > On Mon, Mar 25, 2019 at 10:29 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > > > > > > On Sun, 2019-03-24 at 19:13 +0200, Daniel Lenski wrote: > > > > > > > > Do I have this right? High packet loss from client→VPN, low packet > > > > loss from VPN→client? > > > > > > > > If so, I'm guessing your problems are MTU-related. > > > > > > Hm, wouldn't we expect that to be more consistent? If the full-sized > > > packets are getting lost, that would just stall and not lose the > > > *occasional* packet? > > > > Yeah… should be. My guess is based on a couple of previous > > less-detailed reports from users of earlier versions with GP. > > > > > If it really is a repeatable drop every N packets, I might be inclined > > > to look at sequence numbers and epoch handling. Are we doing any ESP > > > rekeying? > > > > We are rekeying, but only using the most naïve "tunnel rekey" method. > > AFAIK, that's all that GP supports. > > https://gitlab.com/openconnect/openconnect/blob/v8.02/gpst.c#L1153-1157 > > > > After a certain time has elapsed, we tear down the TLS connection and > > reconnect (using the same auth cookie), which also invalidates the > > previous ESP keys and requires us to start using new ones. We should > > handle late incoming packets using the "old" ESP keys correctly, using > > the same method as with Juniper. > > We might handling late incoming packets correctly, but we stop actually > sending them. I wonder if we should continue to send ESP packets on the > "old" connection even while we're doing the reconnect? Hmmm… I don't think so. I did a whole of testing of the "tap-dance" required to enable the ESP tunnel early on, and as far as I can tell there are two main points: 1) As soon as the client config request (POST /ssl-vpn/getconfig.esp) is received, any pre-existing ESP keys become invalid immediately, and new ESP keys become valid immediately. 2) If the client ever connects to the TLS tunnel (bogus pseudo-CONNECT GET request to /ssl-tunnel-connect.sslvpn), the existing ESP keys immediately become invalid. Given this, I believe the safe behavior is to disable the UDP connection entirely before the reconnect starts, and just let the outgoing packet queue grow. Currently, we're actually *not* disabling the UDP connection before starting the reconnect (https://gitlab.com/openconnect/openconnect/blob/v8.02/gpst.c#L1162-1171), but we probably should be. Maybe try this patch…? diff --git a/gpst.c b/gpst.c index a0dc81f..5cd1aab 100644 --- a/gpst.c +++ b/gpst.c @@ -1160,6 +1160,8 @@ int gpst_mainloop(struct openconnect_info *vpninfo, int *timeout) vpn_progress(vpninfo, PRG_ERR, _("GPST Dead Peer Detection detected dead peer!\n")); do_reconnect: + if (vpninfo->proto->udp_close) + vpninfo->proto->udp_close(vpninfo); ret = ssl_reconnect(vpninfo); if (ret) { vpn_progress(vpninfo, PRG_ERR, _("Reconnect failed\n")); > But a reconnect/rekey would be clearly visible in OpenConnect output. > Tony, presumably you'd have seen that and mentioned it? Yeah, there should be a PRG_INFO message on initial connection and reconnection: "Tunnel timeout (rekey interval) is %d minutes." Anyway, I kind of doubt reconnect/rekey is playing a role here… all the real GP VPNs I've heard about have rekey intervals of at least 20 minutes. > Also, you said that you hit this a repeatable 4142 packets into a TCP > connection? That was regardless of how long the VPN had been up? I think Tony said it was the TCP sequence number, no? That part is mystifying. Unless/until the gateway has successfully decrypted the ESP packet, it should have no idea about the TCP seqno, right? Dan _______________________________________________ openconnect-devel mailing list openconnect-devel@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/openconnect-devel