On Wed, Jan 20, 2021 at 10:06 PM CET, Alexei Starovoitov wrote: > cc-ing the right folks > > On Wed, Jan 20, 2021 at 12:30 PM Shanti Lombard née Bouchez-Mongardé > <shanti20210120@xxxxxxxxxx> wrote: >> >> Hello, >> >> I believe this is my first time here, so please excuse me for mistakes. >> Also, please Cc me on answers. >> >> Background : I am currently investigating putting network services on a >> machine without using network namespace but still keep them isolated. To >> do that, I allocated a separate IP address (127.0.0.0/8 for IPv4 and ULA >> prefix below fd00::/8 for IPv6) and those services are forced to listen >> to this IP address only. For some, I use seccomp with a small utility I >> wrote at <https://github.com/mildred/force-bind-seccomp>. Now, I still >> want a few selected services (reverse proxies) to listed for public >> address but they can't necessarily listen with INADDR_ANY because some >> other services might listen on the same port on their private IP. It >> seems SO_REUSEADDR can be used to circumvent this on BSD but not on >> Linux. After much research, I found Cloudflare recent contribution >> (explained here <https://blog.cloudflare.com/its-crowded-in-here/>) >> about inet_lookup BPF programs that could replace INADDR_ANY listening. There is also documentation in the kernel: https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html >> The inet_lookup BPF programs are hooking up in socket selection code for >> incoming packets after connected packets are dispatched to their >> respective sockets but before any new connection is dispatched to a >> listening socket. This is well explained in the blog post. >> >> However, I believe that being able to hook up later in the process could >> have great use cases. With its current position, the BPF program can >> override any listening socket too easily. It can also be surprising for >> administrators used to the socket API not understanding why their >> listening socket does not receives any packet. >> >> Socket selection process (in net/ipv4/inet_hashtables.c function >> __inet_lookup_listener): >> >> - A: look for already connected sockets (before __inet_lookup_listener) >> - B: look for inet_lookup BPF programs >> - C: look for listening sockets specifying address and port >> - D: here, provide another inet_lookup BPF hook >> - E: look for sockets listening using INADDR_ANY >> - F: here, provide another inet_lookup BPF hook >> >> In position D, a BPF program could implement socket listening like >> INADDR_ANY listening would do but without the limitation that the port >> must not be listened on by another IP address >> >> In position F, a BPF program could redirect new connection attempts to a >> socket of its choice, allowing any connection attempt to be intercepted >> if not catched before by an already listening socket. Existing hook is placed before regular listening/unconnected socket lookup to prevent port hijacking on the unprivileged range. >> The suggestion above would work for my use case, but there is another >> possibility to make the same use cases possible : implement in BPF (or >> allow BPF to call) the C and E steps above so the BPF program can >> supplant the kernel behavior. I find this solution less elegant and it >> might not work well in case there are multiple inet_lookup BPF programs >> installed. Having a BPF helper available to BPF sk_lookup programs that looks up a socket by packet 4-tuple and netns ID in tcp/udp hashtables sounds reasonable to me. You gain the flexibility that you describe without adding code on the hot path. [...]