Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

Sabrina Dubroca <sd@xxxxxxxxxxxxxxx> · Fri, 29 Nov 2024 17:10:43 +0100

2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
> On 15.11.2024 17:02, Antonio Quartulli wrote:
> > On 11/11/2024 02:54, Sergey Ryazanov wrote:
> > [...]
> > > > +    skb_reset_transport_header(skb);
> > > > +    skb_probe_transport_header(skb);
> > > > +    skb_reset_inner_headers(skb);
> > > > +
> > > > +    memset(skb->cb, 0, sizeof(skb->cb));
> > > 
> > > Why do we need to zero the control buffer here?
> > 
> > To avoid the next layer to assume the cb is clean while it is not.
> > Other drivers do the same as well.
> 
> AFAIR, there is no convention to clean the control buffer before the handing
> over. The common practice is a bit opposite, programmer shall not assume
> that the control buffer has been zeroed.
> 
> Not a big deal to clean it here, we just can save some CPU cycles avoiding
> it.
> 
> > I think this was recommended by Sabrina as well.
> 
> Curious. It's macsec that does not zero it, or I've not understood how it
> was done.

I only remember discussing a case [1] where one function within ovpn
was expecting a cleared skb->cb to behave correctly but the caller did
not clear it. In general, as you said, clearing cb "to be nice to
other layers" is not expected. Sorry if some comments I made were
confusing.

[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog

> > > > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
> > > > +{
> > > > +    struct ovpn_socket *ovpn_sock;
> > > > +
> > > > +    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) !=
> > > > UDP_ENCAP_OVPNINUDP))
> > > > +        return NULL;
> > > > +
> > > > +    ovpn_sock = rcu_dereference_sk_user_data(sk);
> > > > +    if (unlikely(!ovpn_sock))
> > > > +        return NULL;
> > > > +
> > > > +    /* make sure that sk matches our stored transport socket */
> > > > +    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
> > > > +        return NULL;
> > > > +
> > > > +    return ovpn_sock->ovpn;
> > > 
> > > Now, returning of this pointer is safe. But the following TCP
> > > transport support calls the socket release via a scheduled work.
> > > What extends socket lifetime and makes it possible to receive a UDP
> > > packet way after the interface private data release. Is it correct
> > > assumption?
> > 
> > Sorry you lost me when sayng "following *TCP* transp[ort support calls".
> > This function is invoked only in UDP context.
> > Was that a typ0?
> 
> Yeah, you are right. The question sounds like a riddle. I should eventually
> stop composing emails at midnight. Let me paraphrase it.
> 
> The potential issue is tricky since we create it patch-by-patch.
> 
> Up to this patch the socket releasing procedure looks solid and reliable.
> E.g. the P2P netdev destroying:
> 
>   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>     ovpn_peer_release_p2p
>       ovpn_peer_del_p2p
>         ovpn_peer_put
>           ovpn_peer_release_kref
>             ovpn_peer_release
>               ovpn_socket_put
>                 ovpn_socket_release_kref
>                   ovpn_socket_detach
>                     ovpn_udp_socket_detach
>                       setup_udp_tunnel_sock
>   netdev_run_todo
>     rcu_barrier  <- no running ovpn_udp_encap_recv after this point

It's more the synchronize_net in unregister_netdevice_many_notify?
rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu
waits for rcu_read_lock sections (see the comments for rcu_barrier and
synchronize_rcu in kernel/rcu/tree.c).

>     free_netdev
> 
> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be
> spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will
> be done. All good.
> 
> Then, the following patch 'ovpn: implement TCP transport' disjoin
> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket
> detach function call:
> 
>   ovpn_socket_release_kref
>     ovpn_socket_schedule_release
>       schedule_work(&sock->work)
> 
> And long time after the socket will be actually detached:
> 
>   ovpn_socket_release_work
>     ovpn_socket_detach
>       ovpn_udp_socket_detach
>         setup_udp_tunnel_sock
> 
> And until this detaching will take a place, UDP handler can call
> ovpn_udp_encap_recv() whatever number of times.
> 
> So, we can end up with this scenario:
> 
>   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>     ovpn_peer_release_p2p
>       ovpn_peer_del_p2p
>         ovpn_peer_put
>           ovpn_peer_release_kref
>             ovpn_peer_release
>               ovpn_socket_put
>                 ovpn_socket_release_kref
>                   ovpn_socket_schedule_release
>                     schedule_work(&sock->work)
>   netdev_run_todo
>     rcu_barrier
>     free_netdev
> 
>   ovpn_udp_encap_recv  <- called for an incoming UDP packet
>     ovpn_from_udp_sock <- returns pointer to freed memory
>     // Any access to ovpn pointer is the use-after-free
> 
>   ovpn_socket_release_work  <- kernel finally ivoke the work
>     ovpn_socket_detach
>       ovpn_udp_socket_detach
>         setup_udp_tunnel_sock
> 
> To address the issue, I see two possible solutions:
> 1. flush the workqueue somewhere before the netdev release
> 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach

Going with #2, we could fully split detach into a synchronous part and
async part (with async not needed for UDP). detach_sync clears the
pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so
that no more packets will be sent through the ovpn driver.

Related to that topic, I'm not sure what's keeping a reference on the
peer to guarantee it doesn't get freed before we're done with
peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this
tcp stuff should move from the peer to ovpn_socket?

-- 
Sabrina