Re: [PATCH net] net: bpf: fix request_sock leak in filter.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 11, 2022 at 3:58 AM Martin KaFai Lau <kafai@xxxxxx> wrote:
>
> On Fri, Jun 10, 2022 at 09:08:41AM +0200, Daniel Borkmann wrote:
> > On 6/10/22 2:17 AM, Martin KaFai Lau wrote:
> > > On Thu, Jun 09, 2022 at 10:29:15PM +0200, Daniel Borkmann wrote:
> > > > On 6/9/22 3:18 AM, Jon Maxwell wrote:
> > > > > A customer reported a request_socket leak in a Calico cloud environment. We
> > > > > found that a BPF program was doing a socket lookup with takes a refcnt on
> > > > > the socket and that it was finding the request_socket but returning the parent
> > > > > LISTEN socket via sk_to_full_sk() without decrementing the child request socket
> > > > > 1st, resulting in request_sock slab object leak. This patch retains the
> > > Great catch and debug indeed!
> > >
> > > > > existing behaviour of returning full socks to the caller but it also decrements
> > > > > the child request_socket if one is present before doing so to prevent the leak.
> > > > >
> > > > > Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
> > > > > thanks to Antoine Tenart for the reproducer and patch input.
> > > > >
> > > > > Fixes: f7355a6c0497 bpf: ("Check sk_fullsock() before returning from bpf_sk_lookup()")
> > > > > Fixes: edbf8c01de5a bpf: ("add skc_lookup_tcp helper")
> > > Instead of the above commits, I think this dated back to
> > > 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> > >
> > > > > Tested-by: Curtis Taylor <cutaylor-pub@xxxxxxxxx>
> > > > > Co-developed-by: Antoine Tenart <atenart@xxxxxxxxxx>
> > > > > Signed-off-by:: Antoine Tenart <atenart@xxxxxxxxxx>
> > > > > Signed-off-by: Jon Maxwell <jmaxwell37@xxxxxxxxx>
> > > > > ---
> > > > >    net/core/filter.c | 20 ++++++++++++++------
> > > > >    1 file changed, 14 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 2e32cee2c469..e3c04ae7381f 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -6202,13 +6202,17 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > > > >    {
> > > > >         struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > > > >                                            ifindex, proto, netns_id, flags);
> > > > > +       struct sock *sk1 = sk;
> > > > >         if (sk) {
> > > > >                 sk = sk_to_full_sk(sk);
> > > > > -               if (!sk_fullsock(sk)) {
> > > > > -                       sock_gen_put(sk);
> > > > > +               /* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk1
> > > > > +                * sock refcnt is decremented to prevent a request_sock leak.
> > > > > +                */
> > > > > +               if (!sk_fullsock(sk1))
> > > > > +                       sock_gen_put(sk1);
> > > > > +               if (!sk_fullsock(sk))
> > > In this case, sk1 == sk (timewait).  It is a bit worrying to pass
> > > sk to sk_fullsock(sk) after the above sock_gen_put().
> > > I think Daniel's 'if (sk2 != sk) { sock_gen_put(sk); }' check is better.
> > >
> > > > [ +Martin/Joe/Lorenz ]
> > > >
> > > > I wonder, should we also add some asserts in here to ensure we don't get an unbalance for the
> > > > bpf_sk_release() case later on? Rough pseudocode could be something like below:
> > > >
> > > > static struct sock *
> > > > __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > > >                  struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
> > > >                  u64 flags)
> > > > {
> > > >          struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > > >                                             ifindex, proto, netns_id, flags);
> > > >          if (sk) {
> > > >                  struct sock *sk2 = sk_to_full_sk(sk);
> > > >
> > > >                  if (!sk_fullsock(sk2))
> > > >                          sk2 = NULL;
> > > >                  if (sk2 != sk) {
> > > >                          sock_gen_put(sk);
> > > >                          if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
> > > I don't think it matters if the helper-returned sk2 is refcounted or not (SOCK_RCU_FREE).
> > > The verifier has ensured the bpf_sk_lookup() and bpf_sk_release() are
> > > always balanced regardless of the type of sk2.
> > >
> > > bpf_sk_release() will do the right thing to check the sk2 is refcounted or not
> > > before calling sock_gen_put().
> > >
> > > The bug here is the helper forgot to call sock_gen_put(sk) while
> > > the verifier only tracks the sk2, so I think the 'if (unlikely...) { WARN_ONCE(...); }'
> > > can be saved.
> >
> > I was mainly thinking given in sk_lookup() we have the check around `sk && !refcounted &&
> > !sock_flag(sk, SOCK_RCU_FREE)` to check for unreferenced non-SOCK_RCU_FREE socket, and
> > given sk_to_full_sk() can return inet_reqsk(sk)->rsk_listener we don't have a similar
> > assertion there. Given we don't bump any ref on the latter, it must be SOCK_RCU_FREE then
> Ah. got it.  Thanks for the explanation.
>
> Yep, agree.  It is useful to have this check here to ensure
> no need to bump the sk2 refcnt.  A comment may be useful
> here also, /* Ensure there is no need to bump sk2 refcnt */
>

I'll add that comment.

I'll add the SOCK_RCU_FREE check. We are currently testing the new patch
based on Daniels recommendation. When that is complete I'll resubmit the next
version of the patch including that. It'll probably be a few days.

Regards

Jon

> Thanks!
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux