Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case

John Fastabend <john.fastabend@xxxxxxxxx> · Fri, 31 Jul 2020 15:46:50 -0700

Daniel Borkmann wrote:
> On 7/29/20 6:22 PM, John Fastabend wrote:
> > I had a sockmap program that after doing some refactoring started spewing
> > this splat at me:
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [...]
> > [18610.807359] Call Trace:
> > [18610.807370]  ? 0xffffffffc114d0d5
> > [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> > [18610.807391]  tcp_connect+0x895/0xd50
> > [18610.807400]  tcp_v4_connect+0x465/0x4e0
> > [18610.807407]  __inet_stream_connect+0xd6/0x3a0
> > [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
> > [18610.807417]  inet_stream_connect+0x3b/0x60
> > [18610.807425]  __sys_connect+0xed/0x120
> > 
> > After some debugging I was able to build this simple reproducer,
> > 
> >   __section("sockops/reproducer_bad")
> >   int bpf_reproducer_bad(struct bpf_sock_ops *skops)
> >   {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          return 0;
> >   }
> > 
> > And along the way noticed that below program ran without splat,
> > 
> > __section("sockops/reproducer_good")
> > int bpf_reproducer_good(struct bpf_sock_ops *skops)
> > {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          volatile __maybe_unused __u32 family;
> > 
> >          compiler_barrier();
> > 
> >          family = skops->family;
> >          return 0;
> > }
> > 
> > So I decided to check out the code we generate for the above two
> > programs and noticed each generates the BPF code you would expect,
> > 
> > 0000000000000000 <bpf_reproducer_bad>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r1 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r1
> > ;       return 0;
> >         2:       r0 = 0
> >         3:       exit
> > 
> > 0000000000000000 <bpf_reproducer_good>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r2 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r2
> > ;       family = skops->family;
> >         2:       r1 = *(u32 *)(r1 + 20)
> >         3:       *(u32 *)(r10 - 8) = r1
> > ;       return 0;
> >         4:       r0 = 0
> >         5:       exit
> > 
> > So we get reasonable assembly, but still something was causing the null
> > pointer dereference. So, we load the programs and dump the xlated version
> > observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> > translated by the skops access helpers.
> > 
> > int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> >     3: (61) r1 = *(u32 *)(r1 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r1
> > ; return 0;
> >     5: (b7) r0 = 0
> >     6: (95) exit
> > 
> > int bpf_reproducer_good(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r2 = *(u32 *)(r1 +28)
> >     1: (15) if r2 == 0x0 goto pc+2
> >     2: (79) r2 = *(u64 *)(r1 +0)
> >     3: (61) r2 = *(u32 *)(r2 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r2
> > ; family = skops->family;
> >     5: (79) r1 = *(u64 *)(r1 +0)
> >     6: (69) r1 = *(u16 *)(r1 +16)
> > ; family = skops->family;
> >     7: (63) *(u32 *)(r10 -8) = r1
> > ; return 0;
> >     8: (b7) r0 = 0
> >     9: (95) exit
> > 
> > Then we look at lines 0 and 2 above. In the good case we do the zero
> > check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> > into the bpf_sock_ops check and we can confirm that is the 'struct
> > sock *sk' pointer field. But, in the bad case,
> > 
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> > 
> > Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> > line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > 
> > The 0x01 makes sense because that is exactly the fullsock value. And
> > its not a valid dereference so we splat.
> > 
> > To fix we need to guard the case when a program is doing a sock_ops field
> > access with src_reg == dst_reg. This is already handled in the load case
> > where the ctx_access handler uses a tmp register being careful to
> > store the old value and restore it. To fix the get case test if
> > src_reg == dst_reg and in this case do the is_fullsock test in the
> > temporary register. Remembering to restore the temporary register before
> > writing to either dst_reg or src_reg to avoid smashing the pointer into
> > the struct holding the tmp variable.
> > 
> > Adding this inline code to test_tcpbpf_kern will now be generated
> > correctly from,
> > 
> >    9: r2 = *(u32 *)(r2 + 96)
> > 
> > to xlated code,

I have this in my logs at line 12,

                *(u64 *)(r2 +32) = r9
> >    13: (61) r9 = *(u32 *)(r2 +28)
> >    14: (15) if r9 == 0x0 goto pc+4
> >    15: (79) r9 = *(u64 *)(r2 +32)
> >    16: (79) r2 = *(u64 *)(r2 +0)
> >    17: (61) r2 = *(u32 *)(r2 +2348)
> >    18: (05) goto pc+1
> >    19: (79) r9 = *(u64 *)(r2 +32)
> 
> The diff below looks good to me, but I'm confused on this one above. I'm probably
> missing something, but given this is the dst == src case with the r2 register, where
> in the dump do we first saves the content of r9 into the scratch tmp store?
> Line 19 seems to restore it, but the save is missing, no?
> 
> Please double check whether this was just omitted, but I would really like to have
> the commit message 100% correct as it otherwise causes confusion when we stare at it
> again a month later wrt what was the original intention.

off-by-one on the cut'n'paste into the commit message. Let me send a v3
with a correction to the commit. I do want this to be correct.