Andrii Nakryiko wrote: > On Wed, Nov 17, 2021 at 10:00 AM John Fastabend > <john.fastabend@xxxxxxxxx> wrote: > > > > Andrii Nakryiko wrote: > > > On Sun, Nov 14, 2021 at 9:21 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > > > > > > > Alexei Starovoitov wrote: > > > > > test_maps is failing in bpf tree: > > > > > > > > > > $ ./test_maps > > > > > Failed sockmap recv > > > > > > > > > > and causing BPF CI to stay red. > > > > > > > > > > Since bpf-next is fine, I suspect it is one of John's or Jussi's patches. > > > > > > > > > > Please take a look. > > > > > > > > I'll look into it thanks. > > > > > > Any updates, John? Should we just disable test_maps in CI to make it > > > useful again? > > > > I'm debugging this now. Hopefully I'll have a fix shortly (today I hope). > > Maybe, it makes sense to wait for EOD and if I still don't have the fix > > disable it then. Anyways fixing it is top of list now. > > Sounds good, let's hope you find it and fix it today. OK got the fix, but its fairly subtle. Whats happening is when socks are removed from a map their programs are not actually being removed. They continue to live with the sock for the lifetime of the socket or until the last reference held from BPF side is lost. At which point all progs are dropped and socket returns to normal/preBPF state. We never noticed it on our real use cases because once we move sockets into BPF we never release them until the socket is free. The fix is to null the set progs and then do the update_sk_prot call which will decide based on the configured programs what proto ops need to be set to. So why did the sockmap test work earlier. Before I 'fixed' the logic to use a recv hook that didn't fall back into normal tcp recv handler the strparser was being detatched so no packets made it to the BPF layer and then eventually, with some delay and suboptimal code, we used to fall back to normal tcp stack recv hook. Packet would then end up being recieved via recv() and the test was happy. Per original patch its slightly racy to do this fallback because you don't know, maybe a packet shows up just as you do these checks and you consume it via old tcp handlers and skip over your parser or whatever BPF logic is there. I'll test the below tomorrow and create a commit msg so we have a chance at recalling details in a month or so. Its a bit subtle to send out this evening I think and would like to run it through some of our own Cilium tests. If your willing to wait another day we should be able to get the CI going again. Either way fix on its way soon just not tonight. --- diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 1ae52ac943f6..8eb671c827f9 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1124,6 +1124,8 @@ void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock) void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock) { + psock_set_prog(&psock->progs.stream_parser, NULL); + if (!psock->saved_data_ready) return; @@ -1212,6 +1214,9 @@ void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock) { + psock_set_prog(&psock->progs.stream_verdict, NULL); + psock_set_prog(&psock->progs.skb_verdict, NULL); + if (!psock->saved_data_ready) return; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f39ef79ced67..4ca4b11f4e5f 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -167,8 +167,11 @@ static void sock_map_del_link(struct sock *sk, write_lock_bh(&sk->sk_callback_lock); if (strp_stop) sk_psock_stop_strp(sk, psock); - else + if (verdict_stop) sk_psock_stop_verdict(sk, psock); + + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, psock, false); write_unlock_bh(&sk->sk_callback_lock);