Re: [PATCH net-next v18 20/25] ovpn: implement peer add/get/dump/delete via netlink

Sabrina Dubroca <sd@xxxxxxxxxxxxxxx> · Fri, 17 Jan 2025 18:12:46 +0100

2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote:
> On 17/01/2025 12:48, Sabrina Dubroca wrote:
> > 2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote:
> > >   int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info)
> > >   {
> > > -	return -EOPNOTSUPP;
> > > +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
> > > +	struct ovpn_priv *ovpn = info->user_ptr[0];
> > > +	struct ovpn_socket *ovpn_sock;
> > > +	struct socket *sock = NULL;
> > > +	struct ovpn_peer *peer;
> > > +	u32 sockfd, peer_id;
> > > +	int ret;
> > > +
> > > +	/* peers can only be added when the interface is up and running */
> > > +	if (!netif_running(ovpn->dev))
> > > +		return -ENETDOWN;
> > 
> > Since we're not under rtnl_lock here, the device could go down while
> > we're creating this peer, and we may end up with a down device that
> > has a peer anyway.
> 
> hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in
> an inconsistent state.
> 
> > 
> > I'm not sure what this (and the peer flushing on NETDEV_DOWN) is
> > trying to accomplish. Is it a problem to keep peers when the netdevice
> > is down?
> 
> This is the result of my discussion with Sergey that started in v23 5/23:
> 
> https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@xxxxxxxxxxx/
> 
> The idea was to match operational state with actual connectivity to peer(s).
> 
> Originally I wanted to simply kee the carrier always on, but after further
> discussion (including the meaning of the openvpn option --persist-tun) we
> agreed on following the logic where an UP device has a peer connected (logic
> is slightly different between MP and P2P).
> 
> I am not extremely happy with the resulting complexity, but it seemed to be
> blocker for Sergey.

[after re-reading that discussion with Sergey]

I don't understand why "admin does 'ip link set tun0 down'" means "we
should get rid of all peers. For me the carrier situation goes the
other way: no peer, no carrier (as if I unplugged the cable from my
ethernet card), and it's independent of what the user does (ip link
set XXX up/down). You have that with netif_carrier_{on,off}, but
flushing peers when the admin does "ip link set tun0 down" is separate
IMO.

[...]
> > >   int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
> > >   {
> > > -	return -EOPNOTSUPP;
> > > +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
> > > +	struct ovpn_priv *ovpn = info->user_ptr[0];
> > > +	struct ovpn_peer *peer;
> > > +	u32 peer_id;
> > > +	int ret;
> > > +
> > > +	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
> > > +		return -EINVAL;
> > > +
> > > +	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
> > > +			       ovpn_peer_nl_policy, info->extack);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
> > > +			      OVPN_A_PEER_ID))
> > > +		return -EINVAL;
> > > +
> > > +	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
> > > +	peer = ovpn_peer_get_by_id(ovpn, peer_id);
> > > +	if (!peer) {
> > > +		NL_SET_ERR_MSG_FMT_MOD(info->extack,
> > > +				       "cannot find peer with id %u", peer_id);
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	netdev_dbg(ovpn->dev, "del peer %u\n", peer->id);
> > > +	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
> > 
> > With the delayed socket release (which is similar to what was in v11,
> > but now with refcounting on the netdevice which should make
> > rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to
> > userspace as if the peer was gone, but the socket hasn't been detached
> > yet.
> > 
> > A userspace application that tries to remove the peer and immediately
> > re-create it with the same socket could get EBUSY if the workqueue
> > hasn't done its job yet. That would be quite confusing to the
> > application.
> 
> This may happen only for TCP, because in the UDP case we would increase the
> refcounter and keep the socket attached.

Not if we're re-attaching to a different ovpn instance/netdevice.

> 
> However, re-attaching the same TCP socket is hardly going to happen (in TCP
> we have one socket per peer, therefore if the peer is going away, we're most
> likely killing the socket too).
> 
> This said, the complexity added by the completion seems quite tiny,
> therefore I'll add the code you are suggesting below.

Ok.

-- 
Sabrina