Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Tue, 18 Mar 2025 16:32:55 -0700

On 3/18/25 4:09 PM, Jordan Rife wrote:
To add to this, I actually encountered some strange behavior today
where using bpf_sock_destroy actually /causes/ sockets to repeat
during iteration. In my environment, I just have one socket in a
network namespace with a socket iterator that destroys it. The
iterator visits the same socket twice and calls bpf_sock_destroy twice
as a result. In the UDP case (and maybe TCP, I haven't checked)
bpf_sock_destroy() can call udp_abort (sk->sk_prot->diag_destroy()) ->
__udp_disconnect() -> udp_v4_rehash() (sk->sk_prot->rehash(sk)) which
rehashes the socket and moves it to a new bucket. Depending on where a
socket lands, you may encounter it again as you progress through the
buckets. Doing some inspection with bpftrace seems to confirm this. As
opposed to the edge cases I described before, this is more likely. I
noticed this when I tried to use bpf_seq_write to write something for
every socket that got deleted for an accurate count at the end in
userspace which seems like a fairly valid use case.

imo, this is not a problem for bpf. The bpf prog has access to many fields of a 
udp_sock (ip addresses, ports, state...etc) to make the right decision. The bpf 
prog can decide if that rehashed socket needs to be bpf_sock_destroy(), e.g. the 
saddr in this case because of inet_reset_saddr(sk) before the rehash. From the 
bpf prog's pov, the rehashed udp_sock is not much different from a new udp_sock 
getting added from the userspace into the later bucket.