> >> On 12/10/22 11:35 AM, Daan De Meyer wrote: > >>> These hooks allows intercepting bind(), connect(), getsockname(), > >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix > >>> socket hooks get write access to the address length because the > >>> address length is not fixed when dealing with unix sockets and > >>> needs to be modified when a unix socket address is modified by > >>> the hook. Because abstract socket unix addresses start with a > >>> NUL byte, we cannot recalculate the socket address in kernelspace > >>> after running the hook by calculating the length of the unix socket > >>> path using strlen(). > >> > >> Yes, although we cannot calculate the socket path length with > >> strlen(). But we still have a method to find the path. In > >> unix_seq_show(), the unix socket path is calculated as below, > >> > >> if (u->addr) { // under a hash table lock here > >> int i, len; > >> seq_putc(seq, ' '); > >> > >> i = 0; > >> len = u->addr->len - > >> offsetof(struct sockaddr_un, sun_path); > >> if (u->addr->name->sun_path[0]) { > >> len--; > >> } else { > >> seq_putc(seq, '@'); > >> i++; > >> } > >> for ( ; i < len; i++) > >> seq_putc(seq, u->addr->name->sun_path[i] ?: > >> '@'); > >> } > >> > >> Is it possible that we can use the above method to find the > >> address length so we won't need to pass uaddr_len to bpf program? > >> > >> Since all other hooks do not need to uaddr_len, you could add some > >> new hooks for unix socket which can specially calculate uaddr_len > >> after the bpf program run. > > > > I don't think we can. If we look at the definition of abstract unix > > socket in the official man page: > > > >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0'). The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure. (Null bytes in > >> the name have no special significance.) The name has no connection with filesystem pathnames. When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen - > >> sizeof(sa_family_t)) bytes of sun_path. > > > > This specifically says that the address in the abstract namespace is > > given by the additional bytes in sun_path that are covered by the > > length of the address structure. If I understand correctly, that means > > there's no way to derive the length from just the contents of the > > sockaddr structure. We need > > the actual length as specified by the caller to know which bytes > > belong to the address. Note that it's valid for the abstract name to > > contain Null bytes, so we cannot use those in any way or form to > > detect whether further bytes belong to the address or not. It seems > > valid to have an abstract name > > consisting of 107 Null bytes in sun_path. > > Okay, it looks like bpf program is able to set abstract name as well. > It would be good we have an example for this in selftest. > > With abstract address setable by bpf program, I guess you are right, > we have to let user to explicitly tell us the address length. > > I assume it is possible for user to write an address like below: > "a\0b\0" > addr_len = offsetof(struct sockaddr_un, sun_path) + 4 > but actually it is illegal, right? We have to validate the > legality of sun_path/addr_len beyond unix_validate_addr(), right? This is not actually illegal according to the man page I think, let's look at the following quote from the man page: > Pathname sockets > When binding a socket to a pathname, a few rules should be observed for maximum portability and ease of coding: > > * The pathname in sun_path should be null-terminated. > > * The length of the pathname, including the terminating null byte, should not exceed the size of sun_path. > > * The addrlen argument that describes the enclosing sockaddr_un structure should have a value of at least: > > offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1 > > or, more simply, addrlen can be specified as sizeof(struct sockaddr_un). So when doing a pathname based path, the address length is allowed to be bigger than the actual path. So I don't think we need to do any more validation than what is done by unix_validate_addr(). The selftests are already using abstract unix sockets because they don't need any cleanup. On Tue, 13 Dec 2022 at 21:54, Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 12/13/22 3:36 AM, Daan De Meyer wrote: > >> On 12/10/22 11:35 AM, Daan De Meyer wrote: > >>> These hooks allows intercepting bind(), connect(), getsockname(), > >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix > >>> socket hooks get write access to the address length because the > >>> address length is not fixed when dealing with unix sockets and > >>> needs to be modified when a unix socket address is modified by > >>> the hook. Because abstract socket unix addresses start with a > >>> NUL byte, we cannot recalculate the socket address in kernelspace > >>> after running the hook by calculating the length of the unix socket > >>> path using strlen(). > >> > >> Yes, although we cannot calculate the socket path length with > >> strlen(). But we still have a method to find the path. In > >> unix_seq_show(), the unix socket path is calculated as below, > >> > >> if (u->addr) { // under a hash table lock here > >> int i, len; > >> seq_putc(seq, ' '); > >> > >> i = 0; > >> len = u->addr->len - > >> offsetof(struct sockaddr_un, sun_path); > >> if (u->addr->name->sun_path[0]) { > >> len--; > >> } else { > >> seq_putc(seq, '@'); > >> i++; > >> } > >> for ( ; i < len; i++) > >> seq_putc(seq, u->addr->name->sun_path[i] ?: > >> '@'); > >> } > >> > >> Is it possible that we can use the above method to find the > >> address length so we won't need to pass uaddr_len to bpf program? > >> > >> Since all other hooks do not need to uaddr_len, you could add some > >> new hooks for unix socket which can specially calculate uaddr_len > >> after the bpf program run. > > > > I don't think we can. If we look at the definition of abstract unix > > socket in the official man page: > > > >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0'). The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure. (Null bytes in > >> the name have no special significance.) The name has no connection with filesystem pathnames. When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen - > >> sizeof(sa_family_t)) bytes of sun_path. > > > > This specifically says that the address in the abstract namespace is > > given by the additional bytes in sun_path that are covered by the > > length of the address structure. If I understand correctly, that means > > there's no way to derive the length from just the contents of the > > sockaddr structure. We need > > the actual length as specified by the caller to know which bytes > > belong to the address. Note that it's valid for the abstract name to > > contain Null bytes, so we cannot use those in any way or form to > > detect whether further bytes belong to the address or not. It seems > > valid to have an abstract name > > consisting of 107 Null bytes in sun_path. > > Okay, it looks like bpf program is able to set abstract name as well. > It would be good we have an example for this in selftest. > > With abstract address setable by bpf program, I guess you are right, > we have to let user to explicitly tell us the address length. > > I assume it is possible for user to write an address like below: > "a\0b\0" > addr_len = offsetof(struct sockaddr_un, sun_path) + 4 > but actually it is illegal, right? We have to validate the > legality of sun_path/addr_len beyond unix_validate_addr(), right? > > > > > > > On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@xxxxxxxx> wrote: > >> > >> > >> > >> On 12/10/22 11:35 AM, Daan De Meyer wrote: > >>> These hooks allows intercepting bind(), connect(), getsockname(), > >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix > >>> socket hooks get write access to the address length because the > >>> address length is not fixed when dealing with unix sockets and > >>> needs to be modified when a unix socket address is modified by > >>> the hook. Because abstract socket unix addresses start with a > >>> NUL byte, we cannot recalculate the socket address in kernelspace > >>> after running the hook by calculating the length of the unix socket > >>> path using strlen(). > >> > >> Yes, although we cannot calculate the socket path length with > >> strlen(). But we still have a method to find the path. In > >> unix_seq_show(), the unix socket path is calculated as below, > >> > >> if (u->addr) { // under a hash table lock here > >> int i, len; > >> seq_putc(seq, ' '); > >> > >> i = 0; > >> len = u->addr->len - > >> offsetof(struct sockaddr_un, sun_path); > >> if (u->addr->name->sun_path[0]) { > >> len--; > >> } else { > >> seq_putc(seq, '@'); > >> i++; > >> } > >> for ( ; i < len; i++) > >> seq_putc(seq, u->addr->name->sun_path[i] ?: > >> '@'); > >> } > >> > >> Is it possible that we can use the above method to find the > >> address length so we won't need to pass uaddr_len to bpf program? > >> > >> Since all other hooks do not need to uaddr_len, you could add some > >> new hooks for unix socket which can specially calculate uaddr_len > >> after the bpf program run. > >> > >>> > >>> This hook can be used when users want to multiplex syscall to a > >>> single unix socket to multiple different processes behind the scenes > >>> by redirecting the connect() and other syscalls to process specific > >>> sockets. > >>> --- > >>> include/linux/bpf-cgroup-defs.h | 6 +++ > >>> include/linux/bpf-cgroup.h | 29 ++++++++++- > >>> include/uapi/linux/bpf.h | 14 ++++-- > >>> kernel/bpf/cgroup.c | 11 ++++- > >>> kernel/bpf/syscall.c | 18 +++++++ > >>> kernel/bpf/verifier.c | 7 ++- > >>> net/core/filter.c | 45 +++++++++++++++-- > >>> net/unix/af_unix.c | 85 +++++++++++++++++++++++++++++---- > >>> tools/include/uapi/linux/bpf.h | 14 ++++-- > >>> 9 files changed, 204 insertions(+), 25 deletions(-) > >>> > [...]