Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators

Jordan Rife <jrife@xxxxxxxxxx> · Tue, 18 Mar 2025 16:09:08 -0700

To add to this, I actually encountered some strange behavior today
where using bpf_sock_destroy actually /causes/ sockets to repeat
during iteration. In my environment, I just have one socket in a
network namespace with a socket iterator that destroys it. The
iterator visits the same socket twice and calls bpf_sock_destroy twice
as a result. In the UDP case (and maybe TCP, I haven't checked)
bpf_sock_destroy() can call udp_abort (sk->sk_prot->diag_destroy()) ->
__udp_disconnect() -> udp_v4_rehash() (sk->sk_prot->rehash(sk)) which
rehashes the socket and moves it to a new bucket. Depending on where a
socket lands, you may encounter it again as you progress through the
buckets. Doing some inspection with bpftrace seems to confirm this. As
opposed to the edge cases I described before, this is more likely. I
noticed this when I tried to use bpf_seq_write to write something for
every socket that got deleted for an accurate count at the end in
userspace which seems like a fairly valid use case.

Not sure the best way to avoid this. __udp_disconnect() sets
sk->sk_state to TCP_CLOSE, so filtering out sockets like that during
iteration would avoid repeating sockets you've destroyed, but may be a
bit course-grained; you could inadvertently skip other sockets that
you don't want to skip. The approach in the RFC would work, since you
could just avoid any sockets where abs(sk->sk_idx) > whatever the
table version was when you started iterating, basically iterating only
over what was in your initial "table snapshot", but maybe there's a
simpler approach.

-Jordan