Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of
connected UDP sk lookup?
That is something like:
(i.e., move connected udp socket lookup behind bpf sk lookup prog)
```
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ddb86baaea6c8..9a1408775bcb1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -493,13 +493,6 @@ struct sock *__udp4_lib_lookup(const struct net
*net, __be32 saddr,
slot2 = hash2 & udptable->mask;
hslot2 = &udptable->hash2[slot2];
- /* Lookup connected or non-wildcard socket */
- result = udp4_lib_lookup2(net, saddr, sport,
- daddr, hnum, dif, sdif,
- hslot2, skb);
- if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
- goto done;
-
/* Lookup redirect from BPF */
if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
udptable == net->ipv4.udp_table) {
@@ -512,6 +505,13 @@ struct sock *__udp4_lib_lookup(const struct net
*net, __be32 saddr,
}
}
+ /* Lookup connected or non-wildcard socket */
+ result = udp4_lib_lookup2(net, saddr, sport,
+ daddr, hnum, dif, sdif,
+ hslot2, skb);
+ if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+ goto done;
+
/* Got non-wildcard socket or error on first lookup */
if (result)
goto done;
```
This will be useful, e.g., if there are many concurrent udp sockets of a
same ip:port, where udp4_lib_lookup2() may induce high softirq overhead,
because it computes score for all sockets of the ip:port. With bpf
sk_lookup prog, we can implement 4-tuple hash for udp socket lookup to
solve the problem (if bpf prog runs before udp4_lib_lookup2).
Currently, in udp, bpf sk lookup runs after connected socket lookup.
IIUC, this is because the early version of SK_LOOKUP[0] modified
local_ip/local_port to redirect socket. This may interact wrongly with
udp lookup because udp uses score to select socket, and setting
local_ip/local_port cannot guarantee the result socket selected.
However, now we get socket directly from map in bpf sk_lookup prog, so
the above problem no longer exists.
So is there any other problem on it?Or I'll try to work on it and commit
patches later.
[0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@xxxxxxxxxxxxxx/
Thank you for your time.
--
Philo