2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote: > On 17/01/2025 12:48, Sabrina Dubroca wrote: > > 2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote: > > > int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info) > > > { > > > - return -EOPNOTSUPP; > > > + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; > > > + struct ovpn_priv *ovpn = info->user_ptr[0]; > > > + struct ovpn_socket *ovpn_sock; > > > + struct socket *sock = NULL; > > > + struct ovpn_peer *peer; > > > + u32 sockfd, peer_id; > > > + int ret; > > > + > > > + /* peers can only be added when the interface is up and running */ > > > + if (!netif_running(ovpn->dev)) > > > + return -ENETDOWN; > > > > Since we're not under rtnl_lock here, the device could go down while > > we're creating this peer, and we may end up with a down device that > > has a peer anyway. > > hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in > an inconsistent state. > > > > > I'm not sure what this (and the peer flushing on NETDEV_DOWN) is > > trying to accomplish. Is it a problem to keep peers when the netdevice > > is down? > > This is the result of my discussion with Sergey that started in v23 5/23: > > https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@xxxxxxxxxxx/ > > The idea was to match operational state with actual connectivity to peer(s). > > Originally I wanted to simply kee the carrier always on, but after further > discussion (including the meaning of the openvpn option --persist-tun) we > agreed on following the logic where an UP device has a peer connected (logic > is slightly different between MP and P2P). > > I am not extremely happy with the resulting complexity, but it seemed to be > blocker for Sergey. [after re-reading that discussion with Sergey] I don't understand why "admin does 'ip link set tun0 down'" means "we should get rid of all peers. For me the carrier situation goes the other way: no peer, no carrier (as if I unplugged the cable from my ethernet card), and it's independent of what the user does (ip link set XXX up/down). You have that with netif_carrier_{on,off}, but flushing peers when the admin does "ip link set tun0 down" is separate IMO. [...] > > > int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) > > > { > > > - return -EOPNOTSUPP; > > > + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; > > > + struct ovpn_priv *ovpn = info->user_ptr[0]; > > > + struct ovpn_peer *peer; > > > + u32 peer_id; > > > + int ret; > > > + > > > + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) > > > + return -EINVAL; > > > + > > > + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], > > > + ovpn_peer_nl_policy, info->extack); > > > + if (ret) > > > + return ret; > > > + > > > + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, > > > + OVPN_A_PEER_ID)) > > > + return -EINVAL; > > > + > > > + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); > > > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > > > + if (!peer) { > > > + NL_SET_ERR_MSG_FMT_MOD(info->extack, > > > + "cannot find peer with id %u", peer_id); > > > + return -ENOENT; > > > + } > > > + > > > + netdev_dbg(ovpn->dev, "del peer %u\n", peer->id); > > > + ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE); > > > > With the delayed socket release (which is similar to what was in v11, > > but now with refcounting on the netdevice which should make > > rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to > > userspace as if the peer was gone, but the socket hasn't been detached > > yet. > > > > A userspace application that tries to remove the peer and immediately > > re-create it with the same socket could get EBUSY if the workqueue > > hasn't done its job yet. That would be quite confusing to the > > application. > > This may happen only for TCP, because in the UDP case we would increase the > refcounter and keep the socket attached. Not if we're re-attaching to a different ovpn instance/netdevice. > > However, re-attaching the same TCP socket is hardly going to happen (in TCP > we have one socket per peer, therefore if the peer is going away, we're most > likely killing the socket too). > > This said, the complexity added by the completion seems quite tiny, > therefore I'll add the code you are suggesting below. Ok. -- Sabrina