On Sun, Apr 19, 2020 at 4:54 AM The Wanderer <wanderer@xxxxxxxxxxx> wrote: > > On 2020-04-18 at 22:11, The Wanderer wrote: > > > On 2020-04-18 at 21:52, Daniel Lenski wrote: > > > >> On Sat, Apr 18, 2020 at 6:30 PM The Wanderer <wanderer@xxxxxxxxxxx> > >> wrote: > > >> Well, this clarifies it. There's something screwed up with the > >> server, it appears: > >> > >> * The server gives you initial ESP configuration, and tells you "come > >> back and rekey 180 minutes later." > >> * OpenConnect gets new ESP keys 179 minutes later. > >> * Those new keys appear to work… for a minute. > >> * One minute after getting the new keys, or 180 minutes after getting > >> the original keys… the connection stops working. > >> * When OpenConnect attempts to fallback to HTTPS… the server rejects > >> the (supposed-to-be-unexpired) auth cookie. > >> * It therefore gives up and tries to log out. > >> > >> Unless you can show that some other client software manages to keep > >> the connection going >180 minutes, I have to assume that we're doing > >> the best we can here. > > > > As I believe I recall that 8.05 or possibly 8.02 was working, I've now > > downgraded to 8.05 and initiated another session. If it disconnects > > overnight, I'll repeat with 8.02 in the morning (my next mandatory > > on-shift is Monday). If that disconnects in its turn, then we can > > consider it confirmed that the server has changed its behavior in the > > meantime, and this has nothing to do with the upgrade and is not > > OpenConnect's problem. > > > > If one of those remains connected, however - especially if the logs show > > something else happening at this point - then it may be worth digging > > deeper here. > > When I woke up this morning (about half an hour ago as I begin this > message) and checked, the tunnel was still up, and a DNS-lookup attempt > - which *may* be one I'd already performed in the same session, as a > test of the tunnel right after opening it - succeeded. > > The only meaningful differences between this session and the overnight > one which disconnected, that I can think of, are: > > * It used OpenConnect version 8.05, instead of 8.08. > * It was carried out after two days or so without having had a tunnel > up, rather than no more than a day. (I was called in to be physically at > work for two days, with a day off work in between, so I had no reason to > connect.) > * It did not have the separate, manually-initiated repeated-ping > keepalive running. > > Unless one of those latter two seems plausible as making the difference > for why this remained up, I think we want to keep digging here. Yeah, I don't think any of those is likely significant. Like I said, I ran a test yesterday on another GP server with a 3-hour timeout. It rekeyed and continued the connection successfully after 3 hours, with OpenConnect v8.08. > It's going to take me some time to find the time to spare to do the > obfuscation / anonymization on the full log from this (again just under > 1MB). However, the text that went to the terminal doesn't seem to > contain anything that needs to be obfuscated, and has something that > looks like it might be a hint; based on that, I've extracted a small > fragment of the primary log - starting immediately after the "ESP > (encryption|authentication) type" lines from the first rekey - which > similarly needs no anonymization. Please see both attached. > > (Please ignore the non-timestamped RTNETLINK lines in the > terminal-output log; those appear to be caused by my kludgy routing > script, and I haven't figured out why it's producing them, since the > commands which would produce them as errors seem to be succeeding.) These logs are quite confusing to me. I don't understand why, or HOW, you are getting *repeated* messages of "ESP tunnel connected; exiting HTTPS mainloop." Immediately after printing that message, OpenConnect sets a flag which prevents it from entering that code section again. ¯\_(ツ)_/¯ https://gitlab.com/openconnect/openconnect/blob/v8.08/gpst.c#L1083-1084 > Judging from what I can see, this OpenConnect version may not even be > trying to switch to HTTPS at all; it just keeps waiting and retrying on > ESP, and eventually replies to older ESP packets start coming in and > things sync back up and work again. > > If this causes any noticeable lag or other discrepancies in my actual > VPN connection, I don't recall them. There aren't any code changes I can see that would explain this, we didn't do anything to fix such behavior, and I don't see any evidence of that in the logs you attached. Also, replying to wrong/older ESP packets on a regular basis would jam up your VPN connection thoroughly, and OpenConnect has an ESP replay-protection mechanism that would be complaining ("replayed packet" in the logs) if it were receiving packets significantly out of order. > Do I need to do the work to obfuscate the full log, or is this enough to > point us somewhere that might be useful? Should I try connecting again > (I've shut down the tunnel now) and actually using this VPN for a while, > to see whether I notice any lag etc. after the rekey occurs? I suggest that you try running openconnect without any kind of wrapper script, just "openconnect -u user --prot=gp server.compay.com/whatever --dump -vvvvv 2>&1 | tee openconnect.log". Do that in a terminal and see how it runs. It appears that the wrapper scripts you're using are adding a significant amount of confusion about what's going on here. Dan _______________________________________________ openconnect-devel mailing list openconnect-devel@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/openconnect-devel