Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets

Daan De Meyer <daan.j.demeyer@xxxxxxxxx> · Thu, 15 Dec 2022 14:34:23 +0000

> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >
> > I don't think we can. If we look at the definition of abstract unix
> > socket in the official man page:
> >
> >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> >> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> >> sizeof(sa_family_t)) bytes of sun_path.
> >
> > This specifically says that the address in the abstract namespace is
> > given by the additional bytes in sun_path that are covered by the
> > length of the address structure. If I understand correctly, that means
> > there's no way to derive the length from just the contents of the
> > sockaddr structure. We need
> > the actual length as specified by the caller to know which bytes
> > belong to the address. Note that it's valid for the abstract name to
> > contain Null bytes, so we cannot use those in any way or form to
> > detect whether further bytes belong to the address or not. It seems
> > valid to have an abstract name
> > consisting of 107 Null bytes in sun_path.
>
> Okay, it looks like bpf program is able to set abstract name as well.
> It would be good we have an example for this in selftest.
>
> With abstract address setable by bpf program, I guess you are right,
> we have to let user to explicitly tell us the address length.
>
> I assume it is possible for user to write an address like below:
> "a\0b\0"
> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
> but actually it is illegal, right? We have to validate the
> legality of sun_path/addr_len beyond unix_validate_addr(), right?

This is not actually illegal according to the man page I think, let's
look at the following quote from the man page:

>  Pathname sockets
>      When binding a socket to a pathname, a few rules should be observed for maximum portability and ease of coding:
>
>      *  The pathname in sun_path should be null-terminated.
>
>      *  The length of the pathname, including the terminating null byte, should not exceed the size of sun_path.
>
>      *  The addrlen argument that describes the enclosing sockaddr_un structure should have a value of at least:
>
>             offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1
>
>         or, more simply, addrlen can be specified as sizeof(struct sockaddr_un).

So when doing a pathname based path, the address length is allowed to
be bigger than the actual path. So I don't think
we need to do any more validation than what is done by
unix_validate_addr(). The selftests are already using abstract
unix sockets because they don't need any cleanup.

On Tue, 13 Dec 2022 at 21:54, Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 12/13/22 3:36 AM, Daan De Meyer wrote:
> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >
> > I don't think we can. If we look at the definition of abstract unix
> > socket in the official man page:
> >
> >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> >> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> >> sizeof(sa_family_t)) bytes of sun_path.
> >
> > This specifically says that the address in the abstract namespace is
> > given by the additional bytes in sun_path that are covered by the
> > length of the address structure. If I understand correctly, that means
> > there's no way to derive the length from just the contents of the
> > sockaddr structure. We need
> > the actual length as specified by the caller to know which bytes
> > belong to the address. Note that it's valid for the abstract name to
> > contain Null bytes, so we cannot use those in any way or form to
> > detect whether further bytes belong to the address or not. It seems
> > valid to have an abstract name
> > consisting of 107 Null bytes in sun_path.
>
> Okay, it looks like bpf program is able to set abstract name as well.
> It would be good we have an example for this in selftest.
>
> With abstract address setable by bpf program, I guess you are right,
> we have to let user to explicitly tell us the address length.
>
> I assume it is possible for user to write an address like below:
> "a\0b\0"
> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
> but actually it is illegal, right? We have to validate the
> legality of sun_path/addr_len beyond unix_validate_addr(), right?
>
> >
> >
> > On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@xxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >>
> >>>
> >>> This hook can be used when users want to multiplex syscall to a
> >>> single unix socket to multiple different processes behind the scenes
> >>> by redirecting the connect() and other syscalls to process specific
> >>> sockets.
> >>> ---
> >>>    include/linux/bpf-cgroup-defs.h |  6 +++
> >>>    include/linux/bpf-cgroup.h      | 29 ++++++++++-
> >>>    include/uapi/linux/bpf.h        | 14 ++++--
> >>>    kernel/bpf/cgroup.c             | 11 ++++-
> >>>    kernel/bpf/syscall.c            | 18 +++++++
> >>>    kernel/bpf/verifier.c           |  7 ++-
> >>>    net/core/filter.c               | 45 +++++++++++++++--
> >>>    net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
> >>>    tools/include/uapi/linux/bpf.h  | 14 ++++--
> >>>    9 files changed, 204 insertions(+), 25 deletions(-)
> >>>
> [...]