2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote: > On 15.11.2024 17:02, Antonio Quartulli wrote: > > On 11/11/2024 02:54, Sergey Ryazanov wrote: > > [...] > > > > + skb_reset_transport_header(skb); > > > > + skb_probe_transport_header(skb); > > > > + skb_reset_inner_headers(skb); > > > > + > > > > + memset(skb->cb, 0, sizeof(skb->cb)); > > > > > > Why do we need to zero the control buffer here? > > > > To avoid the next layer to assume the cb is clean while it is not. > > Other drivers do the same as well. > > AFAIR, there is no convention to clean the control buffer before the handing > over. The common practice is a bit opposite, programmer shall not assume > that the control buffer has been zeroed. > > Not a big deal to clean it here, we just can save some CPU cycles avoiding > it. > > > I think this was recommended by Sabrina as well. > > Curious. It's macsec that does not zero it, or I've not understood how it > was done. I only remember discussing a case [1] where one function within ovpn was expecting a cleared skb->cb to behave correctly but the caller did not clear it. In general, as you said, clearing cb "to be nice to other layers" is not expected. Sorry if some comments I made were confusing. [1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog > > > > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) > > > > +{ > > > > + struct ovpn_socket *ovpn_sock; > > > > + > > > > + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != > > > > UDP_ENCAP_OVPNINUDP)) > > > > + return NULL; > > > > + > > > > + ovpn_sock = rcu_dereference_sk_user_data(sk); > > > > + if (unlikely(!ovpn_sock)) > > > > + return NULL; > > > > + > > > > + /* make sure that sk matches our stored transport socket */ > > > > + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) > > > > + return NULL; > > > > + > > > > + return ovpn_sock->ovpn; > > > > > > Now, returning of this pointer is safe. But the following TCP > > > transport support calls the socket release via a scheduled work. > > > What extends socket lifetime and makes it possible to receive a UDP > > > packet way after the interface private data release. Is it correct > > > assumption? > > > > Sorry you lost me when sayng "following *TCP* transp[ort support calls". > > This function is invoked only in UDP context. > > Was that a typ0? > > Yeah, you are right. The question sounds like a riddle. I should eventually > stop composing emails at midnight. Let me paraphrase it. > > The potential issue is tricky since we create it patch-by-patch. > > Up to this patch the socket releasing procedure looks solid and reliable. > E.g. the P2P netdev destroying: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > netdev_run_todo > rcu_barrier <- no running ovpn_udp_encap_recv after this point It's more the synchronize_net in unregister_netdevice_many_notify? rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu waits for rcu_read_lock sections (see the comments for rcu_barrier and synchronize_rcu in kernel/rcu/tree.c). > free_netdev > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be > spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will > be done. All good. > > Then, the following patch 'ovpn: implement TCP transport' disjoin > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket > detach function call: > > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > > And long time after the socket will be actually detached: > > ovpn_socket_release_work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > And until this detaching will take a place, UDP handler can call > ovpn_udp_encap_recv() whatever number of times. > > So, we can end up with this scenario: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > netdev_run_todo > rcu_barrier > free_netdev > > ovpn_udp_encap_recv <- called for an incoming UDP packet > ovpn_from_udp_sock <- returns pointer to freed memory > // Any access to ovpn pointer is the use-after-free > > ovpn_socket_release_work <- kernel finally ivoke the work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > To address the issue, I see two possible solutions: > 1. flush the workqueue somewhere before the netdev release > 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach Going with #2, we could fully split detach into a synchronous part and async part (with async not needed for UDP). detach_sync clears the pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so that no more packets will be sent through the ovpn driver. Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket? -- Sabrina