On Thu, Jul 02, 2020 at 11:24 AM CEST, Jakub Sitnicki wrote: > Run a BPF program before looking up a listening socket on the receive path. > Program selects a listening socket to yield as result of socket lookup by > calling bpf_sk_assign() helper and returning BPF_REDIRECT (7) code. > > Alternatively, program can also fail the lookup by returning with > BPF_DROP (1), or let the lookup continue as usual with BPF_OK (0) on > return. Other return values are treated the same as BPF_OK. > > This lets the user match packets with listening sockets freely at the last > possible point on the receive path, where we know that packets are destined > for local delivery after undergoing policing, filtering, and routing. > > With BPF code selecting the socket, directing packets destined to an IP > range or to a port range to a single socket becomes possible. > > In case multiple programs are attached, they are run in series in the order > in which they were attached. The end result gets determined from return > code from each program according to following rules. > > 1. If any program returned BPF_REDIRECT and selected a valid socket, this > socket will be used as result of the lookup. > 2. If more than one program returned BPF_REDIRECT and selected a socket, > last selection takes effect. > 3. If any program returned BPF_DROP and none returned BPF_REDIRECT, the > socket lookup will fail with -ECONNREFUSED. > 4. If no program returned neither BPF_DROP nor BPF_REDIRECT, socket lookup > continues to htable-based lookup. Lorenz suggested that we cut down the allowed return values to just BPF_OK (pass) or BPF_DROP, and get rid of BPF_REDIRECT. Instead of returning BPF_REDIRECT, BPF program will select a socket with bpf_sk_assign() and return BPF_OK. Also, program will be able to discard the socket is has selected by passing NULL to bpf_sk_assign(). This requires a slight change to verifier in order to support an argument type that is a pointer to full socket or NULL. These simplified semantics seem very attractive. They make the the new type of behave like a filter that can simply pass / drop connection requests in its basic form. And with a key ability to select an alternative socket to handle the connection request when bpf_sk_assign() gets called. It is also closer to how redirection in TC BPF, SK_SKB and SK_REUSEPORT programs work. There is no REDIRECT return code expectation there. We can even go a step further and adopt SK_PASS / SK_DROP as return values, instead of BPF_OK / BPF_DROP, as they are already in use by SK_SKB and SK_REUSEPORT programs. [...]