2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote: > On 26/11/2024 09:49, Antonio Quartulli wrote: > [...] > > > > > > The potential issue is tricky since we create it patch-by-patch. > > > > > > Up to this patch the socket releasing procedure looks solid and > > > reliable. E.g. the P2P netdev destroying: > > > > > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > > > ovpn_peer_release_p2p > > > ovpn_peer_del_p2p > > > ovpn_peer_put > > > ovpn_peer_release_kref > > > ovpn_peer_release > > > ovpn_socket_put > > > ovpn_socket_release_kref > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > netdev_run_todo > > > rcu_barrier <- no running ovpn_udp_encap_recv after this point > > > free_netdev > > > > > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() > > > will be spawned. And after the rcu_barrier() all running > > > ovpn_udp_encap_recv() will be done. All good. > > > > > > > ok > > > > > Then, the following patch 'ovpn: implement TCP transport' disjoin > > > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling > > > the socket detach function call: > > > > > > ovpn_socket_release_kref > > > ovpn_socket_schedule_release > > > schedule_work(&sock->work) > > > > > > And long time after the socket will be actually detached: > > > > > > ovpn_socket_release_work > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > > > > And until this detaching will take a place, UDP handler can call > > > ovpn_udp_encap_recv() whatever number of times. > > > > > > So, we can end up with this scenario: > > > > > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > > > ovpn_peer_release_p2p > > > ovpn_peer_del_p2p > > > ovpn_peer_put > > > ovpn_peer_release_kref > > > ovpn_peer_release > > > ovpn_socket_put > > > ovpn_socket_release_kref > > > ovpn_socket_schedule_release > > > schedule_work(&sock->work) > > > netdev_run_todo > > > rcu_barrier > > > free_netdev > > > > > > ovpn_udp_encap_recv <- called for an incoming UDP packet > > > ovpn_from_udp_sock <- returns pointer to freed memory > > > // Any access to ovpn pointer is the use-after-free > > > > > > ovpn_socket_release_work <- kernel finally ivoke the work > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > > > > To address the issue, I see two possible solutions: > > > 1. flush the workqueue somewhere before the netdev release > > > > yes! This is what I was missing. This will also solve the "how can the > > module wait for all workers to be done before unloading?" > > > > Actually there might be even a simpler solution: each ovpn_socket will hold > a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). > I can simply increase the refcounter those objects while they are referenced > by the socket and decrease it when the socket is fully released (in the > detach() function called by the worker). > > This way the netdev cannot be released until all socket (and all peers) are > gone. > > This approach doesn't require any local workqueue or any other special > coordination as we'll just force the whole cleanup to happen in a specific > order. > > Does it make sense? This dependency between refcounts worries me. I'm already having a hard time remembering how all objects interact together. And since ovpn_peer_release already calls ovpn_socket_put, you'd get a refcount loop if ovpn_socket now also has a ref on the peer, no? -- Sabrina