Re: sockmap test is broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrii Nakryiko wrote:
> On Wed, Nov 17, 2021 at 10:00 AM John Fastabend
> <john.fastabend@xxxxxxxxx> wrote:
> >
> > Andrii Nakryiko wrote:
> > > On Sun, Nov 14, 2021 at 9:21 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > > >
> > > > Alexei Starovoitov wrote:
> > > > > test_maps is failing in bpf tree:
> > > > >
> > > > > $ ./test_maps
> > > > > Failed sockmap recv
> > > > >
> > > > > and causing BPF CI to stay red.
> > > > >
> > > > > Since bpf-next is fine, I suspect it is one of John's or Jussi's patches.
> > > > >
> > > > > Please take a look.
> > > >
> > > > I'll look into it thanks.
> > >
> > > Any updates, John? Should we just disable test_maps in CI to make it
> > > useful again?
> >
> > I'm debugging this now. Hopefully I'll have a fix shortly (today I hope).
> > Maybe, it makes sense to wait for EOD and if I still don't have the fix
> > disable it then. Anyways fixing it is top of list now.
> 
> Sounds good, let's hope you find it and fix it today.

OK got the fix, but its fairly subtle. Whats happening is when socks are
removed from a map their programs are not actually being removed. They
continue to live with the sock for the lifetime of the socket or until
the last reference held from BPF side is lost. At which point all progs
are dropped and socket returns to normal/preBPF state. We never noticed
it on our real use cases because once we move sockets into BPF we never
release them until the socket is free. The fix is to null the set progs
and then do the update_sk_prot call which will decide based on the
configured programs what proto ops need to be set to.

So why did the sockmap test work earlier. Before I 'fixed' the logic to
use a recv hook that didn't fall back into normal tcp recv handler the
strparser was being detatched so no packets made it to the BPF layer and
then eventually, with some delay and suboptimal code, we used to fall back
to normal tcp stack recv hook. Packet would then end up being recieved
via recv() and the test was happy. Per original patch its slightly racy
to do this fallback because you don't know, maybe a packet shows up
just as you do these checks and you consume it via old tcp handlers
and skip over your parser or whatever BPF logic is there.

I'll test the below tomorrow and create a commit msg so we have a chance
at recalling details in a month or so. Its a bit subtle to send out
this evening I think and would like to run it through some of our
own Cilium tests.

If your willing to wait another day we should be able to get the
CI going again. Either way fix on its way soon just not tonight.

---

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 1ae52ac943f6..8eb671c827f9 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1124,6 +1124,8 @@ void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
 
 void sk_psock_stop_strp(struct sock *sk, struct sk_psock *psock)
 {
+       psock_set_prog(&psock->progs.stream_parser, NULL);
+
        if (!psock->saved_data_ready)
                return;
 
@@ -1212,6 +1214,9 @@ void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
 
 void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
 {
+       psock_set_prog(&psock->progs.stream_verdict, NULL);
+       psock_set_prog(&psock->progs.skb_verdict, NULL);
+
        if (!psock->saved_data_ready)
                return;
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index f39ef79ced67..4ca4b11f4e5f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -167,8 +167,11 @@ static void sock_map_del_link(struct sock *sk,
                write_lock_bh(&sk->sk_callback_lock);
                if (strp_stop)
                        sk_psock_stop_strp(sk, psock);
-               else
+               if (verdict_stop)
                        sk_psock_stop_verdict(sk, psock);
+
+               if (psock->psock_update_sk_prot)
+                       psock->psock_update_sk_prot(sk, psock, false);
                write_unlock_bh(&sk->sk_callback_lock);



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux