Re: Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup?

Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> · Thu, 22 Aug 2024 20:29:03 +0200

On Wed, Aug 21, 2024 at 07:44 PM +08, Philo Lu wrote:
> On 2024/8/21 17:23, Jakub Sitnicki wrote:
>> Hi Philo,
>> [CC Eric and Paolo who have more context than me here.]
>> On Tue, Aug 20, 2024 at 08:31 PM +08, Philo Lu wrote:
>>> Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of connected UDP
>>> sk lookup?
>>>
> ...
>>>
>>> So is there any other problem on it？Or I'll try to work on it and commit
>>> patches later.
>>>
>>> [0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@xxxxxxxxxxxxxx/
>>>
>>> Thank you for your time.
>> It was done like that to maintain the connected UDP socket guarantees.
>> Similarly to the established TCP sockets. The contract is that if you
>> are bound to a 4-tuple, you will receive the packets destined to it.
>> 
>
> Thanks for your explaination. IIUC, bpf_sk_lookup was designed to skip connected
> socket lookup (established for TCP and connected for UDP), so it is not supposed
> to run before connected UDP lookup.
> (though it seems so close to solve our problem...)

Yes, correct. Motivation behind bpf_sk_lookup was to steer TCP
connections & UDP flows to listening / unconnected sockets, like you can
do with TPROXY [1].

Since it had nothing to do with established / connected sockets, we
added the BPF hook in such a way that they are unaffected by it.

>> It sounds like you are looking for an efficient way to lookup a
>> connected UDP socket. We would be interested in that as well. We use> connected UDP/QUIC on egress where we don't expect the peer to roam and
>> change its address. There's a memory cost on the kernel side to using
>> them, but they make it easier to structure your application, because you
>> can have roughly the same design for TCP and UDP transport.
>> 
> Yes, we have exactly the same problem.

Good to know that there are other users of connected UDP out there.

Loosely related - I'm planning to raise the question if using connected
UDP sockets on ingress makes sense for QUIC at Plumbers [2].  Connected
UDP lookup performance is one of the aspects, here.

>> So what if instead of doing it in BPF, we make it better for everyone
>> and introduce a hash table keyed by 4-tuple for connected sockets in the
>> udp stack itself (counterpart of ehash in tcp)?
>
> This solution is also ok to me. But I'm not sure are there previous attempts or
> technical problems on it?
>
> In fact, I have done a simple test with 4-tuple UDP lookup, and it does make a
> difference:
> (kernel-5.10, 1000 connected UDP socket on server, use sockperf to send msg to
> one of them, and take average for 5s)
>
> Without 4-tuple lookup:
>
> %Cpu0: 0.0 us, 0.0 sy, 0.0 ni,  0.0 id, 0.0 wa, 0.0 hi, 100.0 si, 0.0 st
> %Cpu1: 0.2 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.2 hi,   0.0 si, 0.0 st
> MiB Mem :7625.1 total,   6761.5 free,    210.2 used,    653.4 buff/cache
> MiB Swap:   0.0 total,      0.0 free,      0.0 used.   7176.2 avail Mem
>
> ---
> With 4-tuple lookup:
>
> %Cpu0: 0.2 us, 0.4 sy, 0.0 ni, 48.1 id, 0.0 wa, 1.2 hi, 50.1 si,  0.0 st
> %Cpu1: 0.6 us, 0.4 sy, 0.0 ni, 98.8 id, 0.0 wa, 0.2 hi,  0.0 si,  0.0 st
> MiB Mem :7625.1 total,   6759.9 free,    211.9 used,    653.3 buff/cache
> MiB Swap:   0.0 total,      0.0 free,      0.0 used.   7174.6 avail Mem

Right. The overhead is expected. All server's connected sockets end up
in one hash bucket and we need to walk a long chain on lookup.

The workaround is not "pretty". You have configure your server to
receive on IP addresses and/or ports :-/

[1] Which also respects established / connected sockets, as long as they
    have_TRANSPARENT flag set.  Users need to set it "manually" for UDP.

[2] https://lpc.events/event/18/abstracts/2134/