On Wed, Aug 21, 2024 at 07:44 PM +08, Philo Lu wrote: > On 2024/8/21 17:23, Jakub Sitnicki wrote: >> Hi Philo, >> [CC Eric and Paolo who have more context than me here.] >> On Tue, Aug 20, 2024 at 08:31 PM +08, Philo Lu wrote: >>> Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of connected UDP >>> sk lookup? >>> > ... >>> >>> So is there any other problem on it?Or I'll try to work on it and commit >>> patches later. >>> >>> [0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@xxxxxxxxxxxxxx/ >>> >>> Thank you for your time. >> It was done like that to maintain the connected UDP socket guarantees. >> Similarly to the established TCP sockets. The contract is that if you >> are bound to a 4-tuple, you will receive the packets destined to it. >> > > Thanks for your explaination. IIUC, bpf_sk_lookup was designed to skip connected > socket lookup (established for TCP and connected for UDP), so it is not supposed > to run before connected UDP lookup. > (though it seems so close to solve our problem...) Yes, correct. Motivation behind bpf_sk_lookup was to steer TCP connections & UDP flows to listening / unconnected sockets, like you can do with TPROXY [1]. Since it had nothing to do with established / connected sockets, we added the BPF hook in such a way that they are unaffected by it. >> It sounds like you are looking for an efficient way to lookup a >> connected UDP socket. We would be interested in that as well. We use> connected UDP/QUIC on egress where we don't expect the peer to roam and >> change its address. There's a memory cost on the kernel side to using >> them, but they make it easier to structure your application, because you >> can have roughly the same design for TCP and UDP transport. >> > Yes, we have exactly the same problem. Good to know that there are other users of connected UDP out there. Loosely related - I'm planning to raise the question if using connected UDP sockets on ingress makes sense for QUIC at Plumbers [2]. Connected UDP lookup performance is one of the aspects, here. >> So what if instead of doing it in BPF, we make it better for everyone >> and introduce a hash table keyed by 4-tuple for connected sockets in the >> udp stack itself (counterpart of ehash in tcp)? > > This solution is also ok to me. But I'm not sure are there previous attempts or > technical problems on it? > > In fact, I have done a simple test with 4-tuple UDP lookup, and it does make a > difference: > (kernel-5.10, 1000 connected UDP socket on server, use sockperf to send msg to > one of them, and take average for 5s) > > Without 4-tuple lookup: > > %Cpu0: 0.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 100.0 si, 0.0 st > %Cpu1: 0.2 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st > MiB Mem :7625.1 total, 6761.5 free, 210.2 used, 653.4 buff/cache > MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7176.2 avail Mem > > --- > With 4-tuple lookup: > > %Cpu0: 0.2 us, 0.4 sy, 0.0 ni, 48.1 id, 0.0 wa, 1.2 hi, 50.1 si, 0.0 st > %Cpu1: 0.6 us, 0.4 sy, 0.0 ni, 98.8 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st > MiB Mem :7625.1 total, 6759.9 free, 211.9 used, 653.3 buff/cache > MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7174.6 avail Mem Right. The overhead is expected. All server's connected sockets end up in one hash bucket and we need to walk a long chain on lookup. The workaround is not "pretty". You have configure your server to receive on IP addresses and/or ports :-/ [1] Which also respects established / connected sockets, as long as they have_TRANSPARENT flag set. Users need to set it "manually" for UDP. [2] https://lpc.events/event/18/abstracts/2134/