Re: [Patch bpf-next v2 2/9] sock: introduce sk_prot->update_proto()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 2, 2021 at 8:22 AM Lorenz Bauer <lmb@xxxxxxxxxxxxxx> wrote:
>
> On Tue, 2 Mar 2021 at 02:37, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> ...
> >  static inline void sk_psock_restore_proto(struct sock *sk,
> >                                           struct sk_psock *psock)
> >  {
> >         sk->sk_prot->unhash = psock->saved_unhash;
>
> Not related to your patch set, but why do an extra restore of
> sk_prot->unhash here? At this point sk->sk_prot is one of our tcp_bpf
> / udp_bpf protos, so overwriting that seems wrong?

Good catch. It seems you are right, but I need a double check. And
yes, it is completely unrelated to my patch, as the current code has
the same problem.

>
> > -       if (inet_csk_has_ulp(sk)) {
> > -               tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
> > -       } else {
> > -               sk->sk_write_space = psock->saved_write_space;
> > -               /* Pairs with lockless read in sk_clone_lock() */
> > -               WRITE_ONCE(sk->sk_prot, psock->sk_proto);
> > -       }
> > +       if (psock->saved_update_proto)
> > +               psock->saved_update_proto(sk, true);
> >  }
> >
> >  static inline void sk_psock_set_state(struct sk_psock *psock,
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 636810ddcd9b..0e8577c917e8 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1184,6 +1184,9 @@ struct proto {
> >         void                    (*unhash)(struct sock *sk);
> >         void                    (*rehash)(struct sock *sk);
> >         int                     (*get_port)(struct sock *sk, unsigned short snum);
> > +#ifdef CONFIG_BPF_SYSCALL
> > +       int                     (*update_proto)(struct sock *sk, bool restore);
>
> Kind of a nit, but this name suggests that the callback is a lot more
> generic than it really is. The only thing you can use it for is to
> prep the socket to be sockmap ready since we hardwire sockmap_unhash,
> etc. It's also not at all clear that this only works if sk has an
> sk_psock associated with it. Calling it without one would crash the
> kernel since the update_proto functions don't check for !sk_psock.
>
> Might as well call it install_sockmap_hooks or something and have a
> valid sk_psock be passed in to the callback. Additionally, I'd prefer

For the name, sure, I am always open to better names. Not sure if
'install_sockmap_hooks' is a good name, I also want to express we
are overriding sk_prot. How about 'psock_update_sk_prot'?


> if the function returned a struct proto * like it does at the moment.
> That way we keep sk->sk_prot manipulation confined to the sockmap code
> and don't have to copy paste it into every proto.

Well, TCP seems too special to do this, as it could call tcp_update_ulp()
to update the proto.

>
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 3bddd9dd2da2..13d2af5bb81c 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw)
> >
> >  static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock)
> >  {
> > -       struct proto *prot;
> > -
> > -       switch (sk->sk_type) {
> > -       case SOCK_STREAM:
> > -               prot = tcp_bpf_get_proto(sk, psock);
> > -               break;
> > -
> > -       case SOCK_DGRAM:
> > -               prot = udp_bpf_get_proto(sk, psock);
> > -               break;
> > -
> > -       default:
> > +       if (!sk->sk_prot->update_proto)
> >                 return -EINVAL;
> > -       }
> > -
> > -       if (IS_ERR(prot))
> > -               return PTR_ERR(prot);
> > -
> > -       sk_psock_update_proto(sk, psock, prot);
> > -       return 0;
> > +       psock->saved_update_proto = sk->sk_prot->update_proto;
> > +       return sk->sk_prot->update_proto(sk, false);
>
> I think reads / writes from sk_prot need READ_ONCE / WRITE_ONCE. We've
> not been diligent about this so far, but I think it makes sense to be
> careful in new code.

Hmm, there are many places not using READ_ONCE/WRITE_ONCE,
for a quick example:

void sock_map_unhash(struct sock *sk)
{
        void (*saved_unhash)(struct sock *sk);
        struct sk_psock *psock;

        rcu_read_lock();
        psock = sk_psock(sk);
        if (unlikely(!psock)) {
                rcu_read_unlock();
                if (sk->sk_prot->unhash)
                        sk->sk_prot->unhash(sk);
                return;
        }

        saved_unhash = psock->saved_unhash;
        sock_map_remove_links(sk, psock);
        rcu_read_unlock();
        saved_unhash(sk);
}

Thanks.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux