Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets

Daan De Meyer <daan.j.demeyer@xxxxxxxxx> · Tue, 13 Dec 2022 11:36:33 +0000

> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> > These hooks allows intercepting bind(), connect(), getsockname(),
> > getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> > socket hooks get write access to the address length because the
> > address length is not fixed when dealing with unix sockets and
> > needs to be modified when a unix socket address is modified by
> > the hook. Because abstract socket unix addresses start with a
> > NUL byte, we cannot recalculate the socket address in kernelspace
> > after running the hook by calculating the length of the unix socket
> > path using strlen().
>
> Yes, although we cannot calculate the socket path length with
> strlen(). But we still have a method to find the path. In
> unix_seq_show(), the unix socket path is calculated as below,
>
>                  if (u->addr) {  // under a hash table lock here
>                          int i, len;
>                          seq_putc(seq, ' ');
>
>                          i = 0;
>                          len = u->addr->len -
>                                  offsetof(struct sockaddr_un, sun_path);
>                          if (u->addr->name->sun_path[0]) {
>                                  len--;
>                          } else {
>                                  seq_putc(seq, '@');
>                                  i++;
>                          }
>                          for ( ; i < len; i++)
>                                  seq_putc(seq, u->addr->name->sun_path[i] ?:
>                                           '@');
>                  }
>
> Is it possible that we can use the above method to find the
> address length so we won't need to pass uaddr_len to bpf program?
>
> Since all other hooks do not need to uaddr_len, you could add some
> new hooks for unix socket which can specially calculate uaddr_len
> after the bpf program run.

I don't think we can. If we look at the definition of abstract unix
socket in the official man page:

> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> sizeof(sa_family_t)) bytes of sun_path.

This specifically says that the address in the abstract namespace is
given by the additional bytes in sun_path that are covered by the
length of the address structure. If I understand correctly, that means
there's no way to derive the length from just the contents of the
sockaddr structure. We need
the actual length as specified by the caller to know which bytes
belong to the address. Note that it's valid for the abstract name to
contain Null bytes, so we cannot use those in any way or form to
detect whether further bytes belong to the address or not. It seems
valid to have an abstract name
consisting of 107 Null bytes in sun_path.

On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> > These hooks allows intercepting bind(), connect(), getsockname(),
> > getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> > socket hooks get write access to the address length because the
> > address length is not fixed when dealing with unix sockets and
> > needs to be modified when a unix socket address is modified by
> > the hook. Because abstract socket unix addresses start with a
> > NUL byte, we cannot recalculate the socket address in kernelspace
> > after running the hook by calculating the length of the unix socket
> > path using strlen().
>
> Yes, although we cannot calculate the socket path length with
> strlen(). But we still have a method to find the path. In
> unix_seq_show(), the unix socket path is calculated as below,
>
>                  if (u->addr) {  // under a hash table lock here
>                          int i, len;
>                          seq_putc(seq, ' ');
>
>                          i = 0;
>                          len = u->addr->len -
>                                  offsetof(struct sockaddr_un, sun_path);
>                          if (u->addr->name->sun_path[0]) {
>                                  len--;
>                          } else {
>                                  seq_putc(seq, '@');
>                                  i++;
>                          }
>                          for ( ; i < len; i++)
>                                  seq_putc(seq, u->addr->name->sun_path[i] ?:
>                                           '@');
>                  }
>
> Is it possible that we can use the above method to find the
> address length so we won't need to pass uaddr_len to bpf program?
>
> Since all other hooks do not need to uaddr_len, you could add some
> new hooks for unix socket which can specially calculate uaddr_len
> after the bpf program run.
>
> >
> > This hook can be used when users want to multiplex syscall to a
> > single unix socket to multiple different processes behind the scenes
> > by redirecting the connect() and other syscalls to process specific
> > sockets.
> > ---
> >   include/linux/bpf-cgroup-defs.h |  6 +++
> >   include/linux/bpf-cgroup.h      | 29 ++++++++++-
> >   include/uapi/linux/bpf.h        | 14 ++++--
> >   kernel/bpf/cgroup.c             | 11 ++++-
> >   kernel/bpf/syscall.c            | 18 +++++++
> >   kernel/bpf/verifier.c           |  7 ++-
> >   net/core/filter.c               | 45 +++++++++++++++--
> >   net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
> >   tools/include/uapi/linux/bpf.h  | 14 ++++--
> >   9 files changed, 204 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > index 7b121bd780eb..8196ccb81915 100644
> > --- a/include/linux/bpf-cgroup-defs.h
> > +++ b/include/linux/bpf-cgroup-defs.h
> > @@ -26,21 +26,27 @@ enum cgroup_bpf_attach_type {
> >       CGROUP_DEVICE,
> >       CGROUP_INET4_BIND,
> >       CGROUP_INET6_BIND,
> > +     CGROUP_UNIX_BIND,
> >       CGROUP_INET4_CONNECT,
> >       CGROUP_INET6_CONNECT,
> > +     CGROUP_UNIX_CONNECT,
> >       CGROUP_INET4_POST_BIND,
> >       CGROUP_INET6_POST_BIND,
> >       CGROUP_UDP4_SENDMSG,
> >       CGROUP_UDP6_SENDMSG,
> > +     CGROUP_UNIX_SENDMSG,
> >       CGROUP_SYSCTL,
> >       CGROUP_UDP4_RECVMSG,
> >       CGROUP_UDP6_RECVMSG,
> > +     CGROUP_UNIX_RECVMSG,
> >       CGROUP_GETSOCKOPT,
> >       CGROUP_SETSOCKOPT,
> >       CGROUP_INET4_GETPEERNAME,
> >       CGROUP_INET6_GETPEERNAME,
> > +     CGROUP_UNIX_GETPEERNAME,
> >       CGROUP_INET4_GETSOCKNAME,
> >       CGROUP_INET6_GETSOCKNAME,
> > +     CGROUP_UNIX_GETSOCKNAME,
> >       CGROUP_INET_SOCK_RELEASE,
> >       CGROUP_LSM_START,
> >       CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index 3ab2f06ddc8a..4de3016f01e4 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -46,21 +46,27 @@ to_cgroup_bpf_attach_type(enum bpf_attach_type attach_type)
> >       CGROUP_ATYPE(CGROUP_DEVICE);
> >       CGROUP_ATYPE(CGROUP_INET4_BIND);
> >       CGROUP_ATYPE(CGROUP_INET6_BIND);
> > +     CGROUP_ATYPE(CGROUP_UNIX_BIND);
> >       CGROUP_ATYPE(CGROUP_INET4_CONNECT);
> >       CGROUP_ATYPE(CGROUP_INET6_CONNECT);
> > +     CGROUP_ATYPE(CGROUP_UNIX_CONNECT);
> >       CGROUP_ATYPE(CGROUP_INET4_POST_BIND);
> >       CGROUP_ATYPE(CGROUP_INET6_POST_BIND);
> >       CGROUP_ATYPE(CGROUP_UDP4_SENDMSG);
> >       CGROUP_ATYPE(CGROUP_UDP6_SENDMSG);
> > +     CGROUP_ATYPE(CGROUP_UNIX_SENDMSG);
> >       CGROUP_ATYPE(CGROUP_SYSCTL);
> >       CGROUP_ATYPE(CGROUP_UDP4_RECVMSG);
> >       CGROUP_ATYPE(CGROUP_UDP6_RECVMSG);
> > +     CGROUP_ATYPE(CGROUP_UNIX_RECVMSG);
> >       CGROUP_ATYPE(CGROUP_GETSOCKOPT);
> >       CGROUP_ATYPE(CGROUP_SETSOCKOPT);
> >       CGROUP_ATYPE(CGROUP_INET4_GETPEERNAME);
> >       CGROUP_ATYPE(CGROUP_INET6_GETPEERNAME);
> > +     CGROUP_ATYPE(CGROUP_UNIX_GETPEERNAME);
> >       CGROUP_ATYPE(CGROUP_INET4_GETSOCKNAME);
> >       CGROUP_ATYPE(CGROUP_INET6_GETSOCKNAME);
> > +     CGROUP_ATYPE(CGROUP_UNIX_GETSOCKNAME);
> >       CGROUP_ATYPE(CGROUP_INET_SOCK_RELEASE);
> >       default:
> >               return CGROUP_BPF_ATTACH_TYPE_INVALID;
> > @@ -273,9 +279,13 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
> >               __ret;                                                       \
> >       })
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen)                      \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_BIND, NULL)
> > +
> >   #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)                                 \
> >       ((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||                  \
> > -       cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&                 \
> > +       cgroup_bpf_enabled(CGROUP_INET6_CONNECT) ||                  \
> > +       cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) &&                  \
> >        (sk)->sk_prot->pre_connect)
> >
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)                     \
> > @@ -284,24 +294,36 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)                     \
> >       BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen)                       \
> > +     BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT)
> > +
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)        \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
> >
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)        \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen)          \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT, NULL)
> > +
> >   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
> >
> >   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)    \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_SENDMSG, t_ctx)
> > +
> >   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)          \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
> >
> >   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)          \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen)           \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_RECVMSG, NULL)
> > +
> >   /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
> >    * fullsock and its parent fullsock cannot be traced by
> >    * sk_to_full_sk().
> > @@ -487,16 +509,21 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
> >   #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; })
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 9e3c33f83bba..b73e4da458fd 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -999,17 +999,21 @@ enum bpf_attach_type {
> >       BPF_SK_MSG_VERDICT,
> >       BPF_CGROUP_INET4_BIND,
> >       BPF_CGROUP_INET6_BIND,
> > +     BPF_CGROUP_UNIX_BIND,
> >       BPF_CGROUP_INET4_CONNECT,
> >       BPF_CGROUP_INET6_CONNECT,
> > +     BPF_CGROUP_UNIX_CONNECT,
> >       BPF_CGROUP_INET4_POST_BIND,
> >       BPF_CGROUP_INET6_POST_BIND,
> >       BPF_CGROUP_UDP4_SENDMSG,
> >       BPF_CGROUP_UDP6_SENDMSG,
> > +     BPF_CGROUP_UNIX_SENDMSG,
> >       BPF_LIRC_MODE2,
> >       BPF_FLOW_DISSECTOR,
> >       BPF_CGROUP_SYSCTL,
> >       BPF_CGROUP_UDP4_RECVMSG,
> >       BPF_CGROUP_UDP6_RECVMSG,
> > +     BPF_CGROUP_UNIX_RECVMSG,
> >       BPF_CGROUP_GETSOCKOPT,
> >       BPF_CGROUP_SETSOCKOPT,
> >       BPF_TRACE_RAW_TP,
> > @@ -1020,8 +1024,10 @@ enum bpf_attach_type {
> >       BPF_TRACE_ITER,
> >       BPF_CGROUP_INET4_GETPEERNAME,
> >       BPF_CGROUP_INET6_GETPEERNAME,
> > +     BPF_CGROUP_UNIX_GETPEERNAME,
> >       BPF_CGROUP_INET4_GETSOCKNAME,
> >       BPF_CGROUP_INET6_GETSOCKNAME,
> > +     BPF_CGROUP_UNIX_GETSOCKNAME,
> >       BPF_XDP_DEVMAP,
> >       BPF_CGROUP_INET_SOCK_RELEASE,
> >       BPF_XDP_CPUMAP,
>
> This is uapi. Please add new attach type to the end of enum type.
>
> > @@ -2575,8 +2581,8 @@ union bpf_attr {
> >    *          *bpf_socket* should be one of the following:
> >    *
> >    *          * **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> > - *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> > - *             and **BPF_CGROUP_INET6_CONNECT**.
> > + *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> > + *             **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
> >    *
> >    *          This helper actually implements a subset of **setsockopt()**.
> >    *          It supports the following *level*\ s:
> > @@ -2809,8 +2815,8 @@ union bpf_attr {
> >    *          *bpf_socket* should be one of the following:
> >    *
> >    *          * **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> > - *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> > - *             and **BPF_CGROUP_INET6_CONNECT**.
> > + *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> > + *             **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
> >    *
> >    *          This helper actually implements a subset of **getsockopt()**.
> >    *          It supports the same set of *optname*\ s that is supported by
> [...]