Re: [PATCH net-next v11 05/23] ovpn: keep carrier always on

Sergey Ryazanov <ryazanov.s.a@xxxxxxxxx> · Mon, 25 Nov 2024 23:32:24 +0200

On 25.11.2024 15:07, Antonio Quartulli wrote:
On 25/11/2024 03:26, Sergey Ryazanov wrote:
OpenVPN (userspace) will tear down the P2P interface upon 
disconnection, assuming the --persist-tun option was not specified by 
the user.

So the interface is gone in any case.

By keeping the netcarrier on we are just ensuring that, if the user 
wanted persist-tun, the iface is not actually making decisions on its 
own.

Regarding a decision on its own. Ethernet interface going to the not- 
running state upon lost of carrier from a switch. It's hardly could be 
considered a decision of the interface. It's an indication of the fact.

Similarly, beeping of UPS is not its decision to make user's life 
miserable, it's the indication of the power line failure. I hope, at 
least we are both agree on that a UPS should indicate the line failure.

The answer is always "it depends".

Back to the 'persist-tun' option. I checked the openvpn(8) man page. 
It gives a reasonable hints to use this option to avoid negative 
outcomes on internal openvpn process restart. E.g. in case of 
privilege dropping. It servers the same purpose as 'persist-key'. And 
there is no word regarding traffic leaking.

FTR, here is the text in the manpage:

        --persist-tun
               Don't close and reopen TUN/TAP device or run up/down 
scripts across SIGUSR1 or --ping-restart restarts.

               SIGUSR1 is a restart signal similar to SIGHUP, but which 
offers finer-grained control over reset options.

SIGUSR1 is a session reconnection, not a process restart.
The manpage just indicates what happens at the low level when this 
option is provided.

Still no mentions of the traffic leaking prevention. Is it?

The next question is: what is this useful for? Many things, among those 
there is the fact the interface will retain its configuration (i.e. IPs, 
routes, etc).

This is unrelated to the correct operational state indication. Addresses 
and routes are not reset in case of interface going to non-running state.

If somebody have decided that this option gives the funny side-effect 
and allows to cut the corners, then I cannot say anything but sorry.

Well, OpenVPN is more than 20 years old.

More than 20 years of misguiding users has been duly noted :)

Should I mention that RFC 1066 containing ifOperStatus definition was 
issues 12 years before OpenVPN? And than it was updated with multiple 
clarifications.

If a given API allows a specific user behaviour and had done so for 
those many years, changing it is still a user breakage. Not much we can do.

With a tun interface this can be done, now you want to basically drop 
this feature that existed for long time and break existing setups.

Amicus Plato, sed magis amica veritas

Yes, I don't want to see this interface misbehaviour advertised as a 
security feature. I hope the previous email gives a detailed 
explanation why.

Let's forget about the traffic leak mention and the "security feature". 
That comment was probably written in the middle of the night and I agree 
it gives a false sense or what is happening.

If it's going to break existing setup, then end-users can be supported 
with a changelog notice explaining how to properly address the risk of 
the traffic leaking.

Nope, we can't just break existing user setups.

At some circumstance, e.g. Android app, it could be the only way to 
prevent traffic leaking. But these special circumstances do not make 
solution generic and eligible for inclusion into the mainline code.

Why not? We are not changing the general rule, but just defining a 
specific behaviour for a specific driver.

Yeah. This patch is not changing the general rule. The patch breaks it 
and the comment in the code makes proud of it. Looks like an old joke 
that documented bug become a feature.

Like I said above, let's make the comment meaningful for the expected 
goal: implement persist-tun while leaving userspace the chance to decide 
what to do.

 From a system administrator or a firmware developer perspective, the 
proposed behaviour will look like inconsistency comparing to other 
interface types. And this inconsistency requires to be addressed with 
special configuration or a dedicated script in a worst case. And I 
cannot see justified reason to make their life harder.

You can configure openvpn to bring the interface down when the 
connection is lost. Why do you say it requires extra scripting and such?

Being administratively down and being operationally down are different 
states.

For example, I don't think a tun interface goes down when there is no 
socket attached to it, still packets are just going to be blackhole'd 
in that case. No?

Nope. Tun interface indeed will go into the non-running state on the 
detach event. Moreover, the tun module supports running/non-running 
indication change upon a command from userspace. But not every 
userspace application feel a desire to implement it.

With 'ovpn' we basically want a similar effect: let userspace decide 
what to do depending on the configuration.

I know it can be implemented in many other different ways..but I 
don't see a real problem with keeping this way.

At least routing protocols and network monitoring software will not 
be happy to see a dead interface pretending that it's still running. 

They won't know that the interface is disconnected, they will 
possibly just see traffic being dropped.

Packet loss detection is quite complex operation. So yes, they are 
indeed monitoring the interface operational state to warn operator as 
soon as possible and take some automatic actions if we are talking 
about routing protocols. Some sophisticated monitoring systems even 
capable to generate events like 'link unstable' with higher severity 
if they see interface operational state flapping in a short period of 
time.

So yeah, for these kinds of systems, proper operational state 
indication is essential.

Again, if the user has not explicitly allowed the persistent behaviour, 
the interface will be brought down when a disconnection happens.
But if the user/administrator *wants* to avoid that, he has needs a 
chance to do that.

Otherwise people that needs this behaviour will just have to stick to 
using tun and the full userspace implementation.

Generally speaking, saying that interface is running, when module 
knows for sure that a packet can not be delivered is a user misguiding.

Or a feature, wanted by the user.

A blackhole/firewall can still be added if the user prefers (and 
not use the persistent interface).

The solution with false-indication is not so reliable as it might 
look. Interface shutdown, inability of a user-space application to 
start, user-space application crash, user-space application restart, 
each of them will void the trick. Ergo, blackhole/firewall must be 
employed by a security concerned user. What makes the proposed 
feature odd.

Yeah, this is what other VPN clients call "kill switch".
Persist-tun is just one piece of the puzzle, yet important.

To summaries, I'm Ok if this change will be merged with a comment 
like "For future study" or "To be done" or "To be implemented". But 
a comment like "to prevent traffic leaking" or any other comment 
implying a "breakthrough security feature" will have a big NACK from 
my side.

What if the comment redirects the user to --persist-tun option in 
order to clarify the context and the wanted behaviour?

Would that help?

Nope. As it was mentioned above, the are no indication that 'persist- 
tun' is a 'security' feature even in the current openvpn documentation.

Like I mentioned above, I agree we should get rid of that sentence.
The security feature must be implemented by means of extra tools, just 
the interface staying up is not enough.

If the openvpn developers want to keep implementation bug-to-bug 
compatible, then feel free to configure the blackhole route on behalf 
of end-user by means of the userspace daemon. Nobody will mind.

bug-to-bug compatible? What do you mean?

http://www.jargon.net/jargonfile/b/bug-compatible.html

With that difference, the local operational state indication does not 
break compatibility between hosts.

Having userspace configure a blackhole route is something that can be 
considered by whoeever decides to implement the "kill switch" feature.

OpenVPN does not. It just implements --persist-tun.

So all in all, the conclusion is that in this case it's usersapce to 
decide when the interface should go up and down, depending on the 
configuration. I'd like to keep it as it is to avoid the ovpn interface 
to make decisions on its own.

I can spell this out in the comment (I think it definitely makes sense), 
to clarify that the netcarrier is expected to be driven by userspace 
(where the control plane is) rather than having the device make 
decisions without having the full picture.

What do you think?

It wasn't suggested to destroy the interface in case of interface 
becoming non-operational. I apologize if something I wrote earlier 
sounded like that. The interface existence stays unquestionable. It's 
going to be solid persistent.

Back to the proposed rephrasing. If the 'full picture' means forcing the 
running state indication even when the netdev is not capable to deliver 
packets, then it looks like an attempt to hide the control knob of the 
misguiding feature somewhere else.

And since the concept of on-purpose false indication is still here, many 
words regarding the control plane and a full picture do not sound good 
either.

--
Sergey