Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

Antonio Quartulli <antonio@xxxxxxxxxxx> · Fri, 15 Nov 2024 16:02:27 +0100

On 11/11/2024 02:54, Sergey Ryazanov wrote:
[...]

+/* Called after decrypt to write the IP packet to the device.
+ * This method is expected to manage/free the skb.
+ */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff 
*skb)
+{
+    unsigned int pkt_len;
+
+    /* we can't guarantee the packet wasn't corrupted before entering 
the
+     * VPN, therefore we give other layers a chance to check that
+     */
+    skb->ip_summed = CHECKSUM_NONE;
+
+    /* skb hash for transport packet no longer valid after 
decapsulation */
+    skb_clear_hash(skb);
+
+    /* post-decrypt scrub -- prepare to inject encapsulated packet 
onto the
+     * interface, based on __skb_tunnel_rx() in dst.h
+     */
+    skb->dev = peer->ovpn->dev;
+    skb_set_queue_mapping(skb, 0);
+    skb_scrub_packet(skb, true);
+

The skb->protocol field is going to be updated in the upcoming patch in 
the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, 
why do not touch the protocol field here?

Well, I would personally not document missing details in a partly 
implemented code path.


+    skb_reset_network_header(skb);

ovpn_decrypt_post() already reseted the network header. Why do we need 
it here again?

yeah, I think this can be removed.


+    skb_reset_transport_header(skb);
+    skb_probe_transport_header(skb);
+    skb_reset_inner_headers(skb);
+
+    memset(skb->cb, 0, sizeof(skb->cb));

Why do we need to zero the control buffer here?

To avoid the next layer to assume the cb is clean while it is not.
Other drivers do the same as well.

I think this was recommended by Sabrina as well.


+    /* cause packet to be "received" by the interface */
+    pkt_len = skb->len;
+    if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
+                     skb) == NET_RX_SUCCESS))
+        /* update RX stats with the size of decrypted packet */
+        dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
+
+static void ovpn_decrypt_post(struct sk_buff *skb, int ret)
+{
+    struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+
+    if (unlikely(ret < 0))
+        goto drop;
+
+    ovpn_netdev_write(peer, skb);
+    /* skb is passed to upper layer - don't free it */
+    skb = NULL;
+drop:
+    if (unlikely(skb))
+        dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
+    ovpn_peer_put(peer);
+    kfree_skb(skb);
+}
+
+/* pick next packet from RX queue, decrypt and forward it to the 
device */

The function now receives packets from externel callers. Should we 
update the above comment?

yap will do.

[...]

--- /dev/null
+++ b/drivers/net/ovpn/proto.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:    Antonio Quartulli <antonio@xxxxxxxxxxx>
+ *        James Yonan <james@xxxxxxxxxxx>
+ */
+
+#ifndef _NET_OVPN_OVPNPROTO_H_
+#define _NET_OVPN_OVPNPROTO_H_
+
+#include "main.h"
+
+#include <linux/skbuff.h>
+
+/* Methods for operating on the initial command
+ * byte of the OpenVPN protocol.
+ */
+
+/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
+ * one byte
+ */
+#define OVPN_KEY_ID_MASK 0x07
+#define OVPN_OPCODE_SHIFT 3
+#define OVPN_OPCODE_MASK 0x1F

Instead of defining mask(s) and shift(s), we can define only masks and 
use bitfield API (see below).

+/* upper bounds on opcode and key ID */
+#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1)
+#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1)
+/* packet opcodes of interest to us */
+#define OVPN_DATA_V1 6 /* data channel V1 packet */
+#define OVPN_DATA_V2 9 /* data channel V2 packet */
+/* size of initial packet opcode */
+#define OVPN_OP_SIZE_V1 1
+#define OVPN_OP_SIZE_V2    4
+#define OVPN_PEER_ID_MASK 0x00FFFFFF
+#define OVPN_PEER_ID_UNDEF 0x00FFFFFF
+/* first byte of keepalive message */
+#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a
+/* first byte of exit message */
+#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28

 From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, 
OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and 
OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be 
removed.

ACK


+/**
+ * ovpn_opcode_from_skb - extract OP code from skb at specified offset
+ * @skb: the packet to extract the OP code from
+ * @offset: the offset in the data buffer where the OP code is located
+ *
+ * Note: this function assumes that the skb head was pulled enough
+ * to access the first byte.
+ *
+ * Return: the OP code
+ */
+static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 
offset)
+{
+    u8 byte = *(skb->data + offset);
+
+    return byte >> OVPN_OPCODE_SHIFT;

For example here, the shift can be replaced with bitfield macro:

#define OVPN_OPCODE_PKTTYPE_MSK  0xf8000000
#define OVPN_OPCODE_KEYID_MSK    0x07000000
#define OVPN_OPCODE_PEERID_MSK   0x00ffffff

static inline u8 ovpn_opcode_from_skb(...)
{
     u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset));

     return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode);
}

And the upcoming ovpn_opcode_compose() can be implemented like this:

static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
{
     return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) |
            FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) |
            FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id);
}

And with this size can be even embedded into ovpn_aead_encrypt() to make 
the header composing more clear.

I wasn't aware of the bitfield API.

Yeah, it looks cleaner and gives a better definition of the first 4 
bytes of the header.

There is also GENMASK() that helps with creating MASKs instead of 
hardcofing the bits in hex.

Will give it a try, thanks!


+}
+
+/**
+ * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
+ * @skb: the packet to extract the OP code from
+ * @offset: the offset in the data buffer where the OP code is located
+ *
+ * Note: this function assumes that the skb head was pulled enough
+ * to access the first 4 bytes.
+ *
+ * Return: the peer ID.
+ */
+static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, 
u16 offset)
+{
+    return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK;
+}
+
+#endif /* _NET_OVPN_OVPNPROTO_H_ */

diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
index 
090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644
--- a/drivers/net/ovpn/socket.c
+++ b/drivers/net/ovpn/socket.c
@@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock)
      if (!sock)
          return;
+    if (sock->sk->sk_protocol == IPPROTO_UDP)
+        ovpn_udp_socket_detach(sock);
+
      sockfd_put(sock);
  }
@@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, 
struct ovpn_peer *peer)
      return ret;
  }
+/* Retrieve the corresponding ovpn object from a UDP socket
+ * rcu_read_lock must be held on entry
+ */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
+{
+    struct ovpn_socket *ovpn_sock;
+
+    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != 
UDP_ENCAP_OVPNINUDP))
+        return NULL;
+
+    ovpn_sock = rcu_dereference_sk_user_data(sk);
+    if (unlikely(!ovpn_sock))
+        return NULL;
+
+    /* make sure that sk matches our stored transport socket */
+    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
+        return NULL;
+
+    return ovpn_sock->ovpn;

Now, returning of this pointer is safe. But the following TCP transport 
support calls the socket release via a scheduled work. What extends 
socket lifetime and makes it possible to receive a UDP packet way after 
the interface private data release. Is it correct assumption?

Sorry you lost me when sayng "following *TCP* transp[ort support calls".
This function is invoked only in UDP context.
Was that a typ0?


If the above is right then shall we set ->ovpn = NULL before scheduling 
the socket releasing work or somehow else mark the socket as half- 
destroyed?

+}
+
  /**
   * ovpn_socket_new - create a new socket and initialize it
   * @sock: the kernel socket to embed
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 
d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -21,9 +21,95 @@
  #include "bind.h"
  #include "io.h"
  #include "peer.h"
+#include "proto.h"
  #include "socket.h"
  #include "udp.h"
+/**
+ * ovpn_udp_encap_recv - Start processing a received UDP packet.
+ * @sk: socket over which the packet was received
+ * @skb: the received packet
+ *
+ * If the first byte of the payload is DATA_V2, the packet is further 
processed,
+ * otherwise it is forwarded to the UDP stack for delivery to user 
space.
+ *
+ * Return:
+ *  0 if skb was consumed or dropped
+ * >0 if skb should be passed up to userspace as UDP (packet not 
consumed)
+ * <0 if skb should be resubmitted as proto -N (packet not consumed)
+ */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
+{
+    struct ovpn_peer *peer = NULL;
+    struct ovpn_struct *ovpn;
+    u32 peer_id;
+    u8 opcode;
+
+    ovpn = ovpn_from_udp_sock(sk);
+    if (unlikely(!ovpn)) {
+        net_err_ratelimited("%s: cannot obtain ovpn object from UDP 
socket\n",
+                    __func__);

Probably we should zero ovpn pointer in the ovpn_sock to survive 
scheduled socket release (see comment in ovpn_from_udp_sock). So, this 
print should be removed to avoid printing misguiding errors.

I am also not following this. ovpn is already NULL if we are entering 
this branch, no?

And I think this condition is quite improbable as well.


+        goto drop_noovpn;
+    }
+
+    /* Make sure the first 4 bytes of the skb data buffer after the UDP
+     * header are accessible.
+     * They are required to fetch the OP code, the key ID and the 
peer ID.
+     */
+    if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
+                    OVPN_OP_SIZE_V2))) {
+        net_dbg_ratelimited("%s: packet too small\n", __func__);
+        goto drop;
+    }
+
+    opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
+    if (unlikely(opcode != OVPN_DATA_V2)) {
+        /* DATA_V1 is not supported */
+        if (opcode == OVPN_DATA_V1)
+            goto drop;

This packet dropping makes protocol accelerator, intendent to speed up 
the data packets processing, a protocol enforcement entity, isn't it? 
Shall we follow the principle of beeing liberal in what we accept and 
just forward everything besides data packets upstream to a userspace 
application?

'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto 
expect any DATA packet to bubble up as it would not know what to do with it.

So any decision regarding data packets should stay in 'ovpn'.

We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used 
nowadays).

Moreover, it's quite impossible that a peer will send us DATA_V1 if it 
passed userspace handshake and negotiation.


+
+        /* unknown or control packet: let it bubble up to userspace */
+        return 1;
+    }
+
+    peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
+    /* some OpenVPN server implementations send data packets with the
+     * peer-id set to undef. In this case we skip the peer lookup by 
peer-id
+     * and we try with the transport address
+     */
+    if (peer_id != OVPN_PEER_ID_UNDEF) {
+        peer = ovpn_peer_get_by_id(ovpn, peer_id);
+        if (!peer) {
+            net_err_ratelimited("%s: received data from unknown peer 
(id: %d)\n",
+                        __func__, peer_id);

Why do we consider a peer sending us garbage our problem? Meaning, this 
peer miss can be not our fault but a malformed packet from a 3rd party 
side. E.g. nowdays I can see a lot of traces of these "active probers" 
in my OpenVPN logs. Shall remove this message or at least make it debug 
to avoid bothering users with garbage traveling Internet? Anyway we can 
not do anything regarding incoming traffic.

It could also be a peer that believes to be connected while 'ovpn' 
dropped it earlier on. So this message would help the admin/user 
understanding what's going on. no?

Maybe make it an info/notice instead of error?


+            goto drop;
+        }
+    }
+
+    if (!peer) {

AFAIU, this condition can true only in case of peer_id beeing equal to 
OVPN_PEER_ID_UNDEF, right? In this case the condition check can be 
replaced by simple 'else' statement.


This part was actually rewritten already, so better wait for v12 before 
further discussing.

And to make code more corresponding to the above comment regarding 
implementations that send undefined peer-id, can we swap sides of the 
lookup method selection? E.g.

/* Comment about fancy implementations sending undefined peer-id */
if (peer_id == OVPN_PEER_ID_UNDEF) {
   /* Do transport address based loockup */
} else {
   /* Do peer-id based loockup */
}

+        /* data packet with undef peer-id */
+        peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
+        if (unlikely(!peer)) {
+            net_dbg_ratelimited("%s: received data with undef peer-id 
from unknown source\n",
+                        __func__);
+            goto drop;
+        }
+    }
+
+    /* pop off outer UDP header */
+    __skb_pull(skb, sizeof(struct udphdr));
+    ovpn_recv(peer, skb);
+    return 0;
+
+drop:
+    if (peer)
+        ovpn_peer_put(peer);

AFAIU, the peer is alway NULL here. Shall we remove the above check?

yeah simplified as well already.

Thanks!

Regards,

--
Antonio Quartulli
OpenVPN Inc.