Re: GlobalProtect connection loss

Daniel Lenski <dlenski@xxxxxxxxx> · Sat, 18 Apr 2020 18:52:30 -0700

On Sat, Apr 18, 2020 at 6:30 PM The Wanderer <wanderer@xxxxxxxxxxx> wrote:
>
> On 2020-04-18 at 20:14, Daniel Lenski wrote:
>
> > This is a useful log. Key interesting parts below.
>
> <large snip>
>
> > 5. This suggests a server misconfiguration. The server has sent new
> > ESP keys and started using them, but then it gets confused and
> > invalidates the ESP connection after the original interval elapses.
> > That's annoying, and we've seen it before… but what OpenConnect
> > should do to workaround it here is that it should give up on ESP and
> > attempt to fallback to HTTPS. Which it appears to do:
> >
> > [2020-04-14 00:20:43] Connecting to HTTPS tunnel endpoint ...
> > [2020-04-14 00:20:43] SSL negotiation with foo.bar.baz
> > [2020-04-14 00:20:43] Connected to HTTPS on foo.bar.baz with ciphersuite
> > (TLS1.2)-(RSA)-(AES-256-GCM)
> > [2020-04-14 00:20:43] > GET
> > /ssl-tunnel-connect.sslvpn?authcookie=AUTHCOOKIE&user=foobar_user HTTP/1.1
> > [2020-04-14 00:20:43] >
> >
> > … but what's completely puzzling here is that we don't see either of
> > the following messages prior to this attempted failover to HTTPS:
> >     ESP detected dead peer
> >     Failed to connect ESP tunnel; using HTTPS instead.
>
> That's because of a quirk of the way I invoked OpenConnect and did the
> redirection; those messages actually get printed to the terminal where
> the program was invoked from, and don't make it into the file. This is
> what I was referring to with

I really don't understand how — or why — you're logging this so that
*some* of these messages go to the terminal, and others *only* go to
the log file, but in any case the lines that were missing here are
critical information.

> >> In hindsight I realize there was one thing I could have done
> >> differently on this session invocation to be more sure I didn't
> >> miss any stderr, etc., messages - but I didn't do it, so I'm not
> >> entirely positive there's nothing missing. I hope all the important
> >> stuff is there.
>
> in the original attempt at sending this.
>
> The relevant messages from the terminal from one such session, starting
> from that point and continuing until exit, are:
>
> >> [2020-04-15 14:25:56] ESP detected dead peer
> >> [2020-04-15 14:26:01] Failed to connect ESP tunnel; using HTTPS instead.
> >> [2020-04-15 14:26:01] Got inappropriate HTTP GET-tunnel response: HTTP/1.1 502 Bad Gateway
> >> [2020-04-15 14:26:01] Invalid user name
> >> [2020-04-15 14:26:01] Logout failed.
> >> RTNETLINK answers: No such process
> >> RTNETLINK answers: No such process
> >> [2020-04-15 14:26:01] Unknown error; exiting.
>

Well, this clarifies it. There's something screwed up with the server,
it appears:

* The server gives you initial ESP configuration, and tells you "come
back and rekey 180 minutes later."
* OpenConnect gets new ESP keys 179 minutes later.
* Those new keys appear to work… for a minute.
* One minute after getting the new keys, or 180 minutes after getting
the original keys… the connection stops working.
* When OpenConnect attempts to fallback to HTTPS… the server rejects
the (supposed-to-be-unexpired) auth cookie.
* It therefore gives up and tries to log out.

Unless you can show that some other client software manages to keep
the connection going >180 minutes, I have to assume that we're doing
the best we can here.

> > 6. Finally, one more strange thing in your log. Something (????)
> > causes it to attempt a logout immediately after trying to fallback to
> > HTTPS, but the logout *fails* (POST
> > https://foo.bar.baz/ssl-vpn/logout.esp -> "Invalid user name").
>
> Yeah, this is the thing that seemed weirdest to me. I certainly did not
> do anything which should have led this logout attempt to happen - as I
> noted in my original mail, I was asleep at the time.
>
> I just checked one of the other logs I've been grabbing of these
> sessions in the meantime, and it does end with a similar logout interaction.

I'm guessing this is because of an inconsistent encoding of the domain
parameter (or maybe something else). I've seen it happen before when
the server domain is "(empty_domain)". just posted a patch to try to
alleviate this issue.
https://gitlab.com/openconnect/openconnect/-/merge_requests/93

> What I'm doing here is invoking a shell script (as ordinary user) from
> the command line, which runs
>
> su - -c "~/bin/vpn-bar 2>&1 >>
> /home/ORIGINALUSER/tmp/vpn-bar-highverbosity.log"
>
> and that command line in turn calls a shell script which sits under
> /root/bin/, gets run as root, and contains the actual invocation of
> openconnect - including
>   echo password | openconnect --passwd-on-stdin [other-options]
> and a custom routing script which adjusts the routing table so that only
> traffic which *needs* the VPN goes over it (that latter being why I need
> to run this as root, rather than pre-creating a tun* device node and
> telling openconnect to use it).

Something like what vpn-slice does
(https://github.com/dlenski/vpn-slice/), but without needing a messy
intermediary script?

-Dan

_______________________________________________
openconnect-devel mailing list
openconnect-devel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/openconnect-devel