Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

Antonio Quartulli <antonio@xxxxxxxxxxx> · Tue, 26 Nov 2024 09:49:12 +0100

On 26/11/2024 01:32, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote:
[...]

+/* Called after decrypt to write the IP packet to the device.
+ * This method is expected to manage/free the skb.
+ */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct 
sk_buff *skb)
+{
+    unsigned int pkt_len;
+
+    /* we can't guarantee the packet wasn't corrupted before 
entering the
+     * VPN, therefore we give other layers a chance to check that
+     */
+    skb->ip_summed = CHECKSUM_NONE;
+
+    /* skb hash for transport packet no longer valid after 
decapsulation */
+    skb_clear_hash(skb);
+
+    /* post-decrypt scrub -- prepare to inject encapsulated packet 
onto the
+     * interface, based on __skb_tunnel_rx() in dst.h
+     */
+    skb->dev = peer->ovpn->dev;
+    skb_set_queue_mapping(skb, 0);
+    skb_scrub_packet(skb, true);
+

The skb->protocol field is going to be updated in the upcoming patch 
in the caller (ovpn_decrypt_post). Shall we put a comment here 
clarifying, why do not touch the protocol field here?

Well, I would personally not document missing details in a partly 
implemented code path.

Looks like the question wasn't precisely emphrased. By bad. Let me 
elaborate it in more details:
1. usually skb->protocol is updated just before a packet leaves a module
2. I've not found it were it was expected
3. skb->protocol is updated in the caller function - 
ovpn_decrypt_post(), along with the skb_reset_network_header() call.

The question is, shall we put some comment here in the 
ovpn_netdev_write() function elaborating that this was done in the 
caller? Or is such comment odd?

Ok, got it.
Mah personally I don't think it's truly needed. But I have no strong 
opinion.


+    skb_reset_network_header(skb);

ovpn_decrypt_post() already reseted the network header. Why do we 
need it here again?

yeah, I think this can be removed.


+    skb_reset_transport_header(skb);
+    skb_probe_transport_header(skb);
+    skb_reset_inner_headers(skb);
+
+    memset(skb->cb, 0, sizeof(skb->cb));

Why do we need to zero the control buffer here?

To avoid the next layer to assume the cb is clean while it is not.
Other drivers do the same as well.

AFAIR, there is no convention to clean the control buffer before the 
handing over. The common practice is a bit opposite, programmer shall 
not assume that the control buffer has been zeroed.

Not a big deal to clean it here, we just can save some CPU cycles 
avoiding it.

If there is no convention, then I agree with you and I'd remove it.


I think this was recommended by Sabrina as well.

Curious. It's macsec that does not zero it, or I've not understood how 
it was done.

I don't see it being zero'd. So I possibly misunderstood the suggestion.
I'll remove the memset.


+    /* cause packet to be "received" by the interface */
+    pkt_len = skb->len;
+    if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
+                     skb) == NET_RX_SUCCESS))
+        /* update RX stats with the size of decrypted packet */
+        dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
+

[...]

diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
index 
090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644
--- a/drivers/net/ovpn/socket.c
+++ b/drivers/net/ovpn/socket.c
@@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock)
      if (!sock)
          return;
+    if (sock->sk->sk_protocol == IPPROTO_UDP)
+        ovpn_udp_socket_detach(sock);
+
      sockfd_put(sock);
  }
@@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket 
*sock, struct ovpn_peer *peer)
      return ret;
  }
+/* Retrieve the corresponding ovpn object from a UDP socket
+ * rcu_read_lock must be held on entry
+ */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
+{
+    struct ovpn_socket *ovpn_sock;
+
+    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != 
UDP_ENCAP_OVPNINUDP))
+        return NULL;
+
+    ovpn_sock = rcu_dereference_sk_user_data(sk);
+    if (unlikely(!ovpn_sock))
+        return NULL;
+
+    /* make sure that sk matches our stored transport socket */
+    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
+        return NULL;
+
+    return ovpn_sock->ovpn;

Now, returning of this pointer is safe. But the following TCP 
transport support calls the socket release via a scheduled work. What 
extends socket lifetime and makes it possible to receive a UDP packet 
way after the interface private data release. Is it correct assumption?

Sorry you lost me when sayng "following *TCP* transp[ort support calls".
This function is invoked only in UDP context.
Was that a typ0?

Yeah, you are right. The question sounds like a riddle. I should 
eventually stop composing emails at midnight. Let me paraphrase it.

:)


The potential issue is tricky since we create it patch-by-patch.

Up to this patch the socket releasing procedure looks solid and 
reliable. E.g. the P2P netdev destroying:

   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
     ovpn_peer_release_p2p
       ovpn_peer_del_p2p
         ovpn_peer_put
           ovpn_peer_release_kref
             ovpn_peer_release
               ovpn_socket_put
                 ovpn_socket_release_kref
                   ovpn_socket_detach
                     ovpn_udp_socket_detach
                       setup_udp_tunnel_sock
   netdev_run_todo
     rcu_barrier  <- no running ovpn_udp_encap_recv after this point
     free_netdev

After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will 
be spawned. And after the rcu_barrier() all running 
ovpn_udp_encap_recv() will be done. All good.


ok

Then, the following patch 'ovpn: implement TCP transport' disjoin 
ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the 
socket detach function call:

   ovpn_socket_release_kref
     ovpn_socket_schedule_release
       schedule_work(&sock->work)

And long time after the socket will be actually detached:

   ovpn_socket_release_work
     ovpn_socket_detach
       ovpn_udp_socket_detach
         setup_udp_tunnel_sock

And until this detaching will take a place, UDP handler can call 
ovpn_udp_encap_recv() whatever number of times.

So, we can end up with this scenario:

   ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
     ovpn_peer_release_p2p
       ovpn_peer_del_p2p
         ovpn_peer_put
           ovpn_peer_release_kref
             ovpn_peer_release
               ovpn_socket_put
                 ovpn_socket_release_kref
                   ovpn_socket_schedule_release
                     schedule_work(&sock->work)
   netdev_run_todo
     rcu_barrier
     free_netdev

   ovpn_udp_encap_recv  <- called for an incoming UDP packet
     ovpn_from_udp_sock <- returns pointer to freed memory
     // Any access to ovpn pointer is the use-after-free

   ovpn_socket_release_work  <- kernel finally ivoke the work
     ovpn_socket_detach
       ovpn_udp_socket_detach
         setup_udp_tunnel_sock

To address the issue, I see two possible solutions:
1. flush the workqueue somewhere before the netdev release

yes! This is what I was missing. This will also solve the "how can the 
module wait for all workers to be done before unloading?"

2. set ovpn_sock->ovpn = NULL before scheduling the socket detach

This makes sense too. But 1 is definitely what we need.


If the above is right then shall we set ->ovpn = NULL before 
scheduling the socket releasing work or somehow else mark the socket 
as half- destroyed?

Will think about it, it may make sense to nullify ->ovpn as well.


+}
+
  /**
   * ovpn_socket_new - create a new socket and initialize it
   * @sock: the kernel socket to embed
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 
d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -21,9 +21,95 @@
  #include "bind.h"
  #include "io.h"
  #include "peer.h"
+#include "proto.h"
  #include "socket.h"
  #include "udp.h"
+/**
+ * ovpn_udp_encap_recv - Start processing a received UDP packet.
+ * @sk: socket over which the packet was received
+ * @skb: the received packet
+ *
+ * If the first byte of the payload is DATA_V2, the packet is 
further processed,
+ * otherwise it is forwarded to the UDP stack for delivery to user 
space.
+ *
+ * Return:
+ *  0 if skb was consumed or dropped
+ * >0 if skb should be passed up to userspace as UDP (packet not 
consumed)
+ * <0 if skb should be resubmitted as proto -N (packet not consumed)
+ */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{
+    struct ovpn_peer *peer = NULL;
+    struct ovpn_struct *ovpn;
+    u32 peer_id;
+    u8 opcode;
+
+    ovpn = ovpn_from_udp_sock(sk);
+    if (unlikely(!ovpn)) {
+        net_err_ratelimited("%s: cannot obtain ovpn object from UDP 
socket\n",
+                    __func__);

Probably we should zero ovpn pointer in the ovpn_sock to survive 
scheduled socket release (see comment in ovpn_from_udp_sock). So, 
this print should be removed to avoid printing misguiding errors.

I am also not following this. ovpn is already NULL if we are entering 
this branch, no?

And I think this condition is quite improbable as well.

Here, due to the scheduled nature of the detach function invocation, 
ovpn_from_udp_sock() can return us a pointer to the freed memory.

So we should prevent ovpn_udp_encap_recv() invocation after the netdev 
release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL 
even before scheduling the socket detaching. And in this case, 
ovpn_from_udp_sock() returning NULL will be a legitimate case and we 
should drop the error printing.

ok got it. it is related with the comment above.


+        goto drop_noovpn;
+    }
+
+    /* Make sure the first 4 bytes of the skb data buffer after the 
UDP
+     * header are accessible.
+     * They are required to fetch the OP code, the key ID and the 
peer ID.
+     */
+    if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
+                    OVPN_OP_SIZE_V2))) {
+        net_dbg_ratelimited("%s: packet too small\n", __func__);
+        goto drop;
+    }
+
+    opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
+    if (unlikely(opcode != OVPN_DATA_V2)) {
+        /* DATA_V1 is not supported */
+        if (opcode == OVPN_DATA_V1)
+            goto drop;

This packet dropping makes protocol accelerator, intendent to speed 
up the data packets processing, a protocol enforcement entity, isn't 
it? Shall we follow the principle of beeing liberal in what we accept 
and just forward everything besides data packets upstream to a 
userspace application?

'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto 
expect any DATA packet to bubble up as it would not know what to do 
with it.

So any decision regarding data packets should stay in 'ovpn'.

We just decided to support the modern DATA_V2 (DATA_V1 is seldomly 
used nowadays).

Moreover, it's quite impossible that a peer will send us DATA_V1 if it 
passed userspace handshake and negotiation.

The question was about the special handling of this packet type. If this 
packet type is unlikely, then why should the kernel take special care of 
it? Is this specific packet type going to crash the userspace application?

Not crash (hopefully) but will create confusion because it is 
unexpected. The userspace dataplane path is technically inactive when 
'ovpn' is in use.

The idea is that any DATA_V* packet should be handled in kernelspace and 
userspace should not need to care.


+
+        /* unknown or control packet: let it bubble up to userspace */
+        return 1;
+    }
+
+    peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
+    /* some OpenVPN server implementations send data packets with the
+     * peer-id set to undef. In this case we skip the peer lookup 
by peer-id
+     * and we try with the transport address
+     */
+    if (peer_id != OVPN_PEER_ID_UNDEF) {
+        peer = ovpn_peer_get_by_id(ovpn, peer_id);
+        if (!peer) {
+            net_err_ratelimited("%s: received data from unknown 
peer (id: %d)\n",
+                        __func__, peer_id);

Why do we consider a peer sending us garbage our problem? Meaning, 
this peer miss can be not our fault but a malformed packet from a 3rd 
party side. E.g. nowdays I can see a lot of traces of these "active 
probers" in my OpenVPN logs. Shall remove this message or at least 
make it debug to avoid bothering users with garbage traveling 
Internet? Anyway we can not do anything regarding incoming traffic.

It could also be a peer that believes to be connected while 'ovpn' 
dropped it earlier on. So this message would help the admin/user 
understanding what's going on. no?

It could help troubleshooting, potentionally. On the other hand, it will 
flood the kernel log with whatever junk is floating around the Internet. 
For sure.

Well, only packets having the right opcode in it and being large enough.
Because we have already dropped anything that doesn't look like a 
DATA_V2 packet at this point.


Maybe make it an info/notice instead of error?

At best it can be a debug message for developers. But IMHO the really 
best choice is to get rid of it.

But yeah, I agree with you.
Will just silently drop.


+            goto drop;
+        }
+    }

--
Sergey

Thanks,
Regards,


--
Antonio Quartulli
OpenVPN Inc.