Hello, On Thu, 2022-12-01 at 14:42 +0100, Ondrej Mosnacek wrote: > As discovered by our QE, there is a problem with how the > (userspace-facing) sockets returned by accept(2) are labeled when > using MPTCP. Currently they always end up with the label representing > the kernel (typically system_u:system_r:kernel_t:s0), white they > should inherit the context from the parent socket (the one that is > passed to accept(2)). > > A minimal reproducer on a Fedora/CentOS/RHEL system: > > # Install dependencies > dnf install -y mptcpd nginx curl > # Disable rules that silence some SELinux denials > semodule -DB > # Set up a dummy file to be served by nginx > echo test > /usr/share/nginx/html/testfile > chmod +r /usr/share/nginx/html/testfile > # Set up nginx to use MPTCP > sysctl -w net.mptcp.enabled=1 > systemctl stop nginx > mptcpize enable nginx > systemctl start nginx > # This will fail (no reply from server) > mptcpize run curl -k -o /dev/null http://127.0.0.1/testfile > # This will show the SELinux denial that caused the failure > ausearch -i -m avc | grep httpd > > It is also possible to trigger the issue by running the > selinux-testsuite [1] under `mptcpize run` (it will fail on the > inet_socket test in multiple places). > > Based on what I could infer from the net & mptcp code, this is roughly > how it happens (may be inaccurate or incorrect - the maze of the > networking stack is not easy to navigate for me): > 1. When the server starts, the main mptcp socket is created: > socket(2) -> ... -> socket_create() -> inet_create() -> > mptcp_init_sock() -> __mptcp_socket_create() > 2. __mptcp_socket_create() calls mptcp_subflow_create_socket(), which > creates another "kern" socket, which represents the initial(?) > subflow. > 3. This subflow socket goes through security_socket_post_create() -> > selinux_socket_post_create(), which gives it a kernel label based on > kern == 1, which indicates that it's a kernel-internal socket. > 4. The main socket goes through its own selinux_socket_post_create(), > which gives it the label based on the current task. > 5. Later, when the client connection is accepted via accept(2) on the > main socket, an underlying accept operation is performed on the > subflow socket, which is then returned directly as the result of the > accept(2) syscall. > 6. Since this socket is cloned from the subflow socket, it inherits > the kernel label from the original subflow socket (via > selinux_inet_conn_request() and selinux_inet_csk_clone()). > selinux_sock_graft() then also copies the label onto the inode > representing the socket. > 7. When nginx later calls writev(2) on the new socket, > selinux_file_permission() uses the inode label as the target in a > tcp_socket::write permission check. This is denied, as in the Fedora > policy httpd_t isn't allowed to write to kernel_t TCP sockets. > > Side note: There is currently an odd conditional in sock_has_perm() in > security/selinux/hooks.c that skips SELinux permission checking for > sockets that have the kernel label, so native socket operations (such > as recv(2), send(2), recvmsg(2), ...) will not uncover this problem, > only generic file operations such as read(2), write(2), writev(2), > etc. I believe that check shouldn't be there, but that's for another > discussion... That comes from: commit e2943dca2d5b67e9578111986495483fe720d58b Author: James Morris <jmorris@xxxxxxxxxx> Date: Sat May 8 01:00:33 2004 -0700 [NET]: Add sock_create_kern() """ This addresses a class of potential issues in SELinux where, for example, a TCP NFS session times out, then the kernel re-establishes an RPC connection upon further user activity. We do not want such kernel created sockets to be labeled with user security contexts. """ > So now the big question is: How to fix this? I can think of several > possible solutions, but neither of them seems to be the obvious > correct one: > 1. Wrap the socket cloned from the subflow socket in another socket > (similar to how the main socket + subflow(s) are handled), which would > be cloned from the non-kern outer socket that has the right label. > This could have the disadvantage of adding unnecessary overhead, but > would probably be simpler to do. I would avoid that option: we already have a suboptimal amount of indirection. > 2. Somehow ensure that the cloned socket gets the label from the main > socket instead of the subflow socket. This would probably require > adding a new LSM hook and I'm not sure at all what would be the best > way to implement this. > 3. Somehow communicate the subflow socket <-> main socket relationship > to the LSM layer so that it can switch to use the label of the main > socket when handling an operation on a subflow socket (thus copying > the label correctly on accept(2)). Not a great solution, as it > requires each LSM that labels sockets to duplicate the indirection > logic. > 4. Do not create the subflow sockets as "kern". (Not sure if that > would be desirable.) No, we need subflow being kernel sockets. Lockdep will bail otherwise, and the would be the tip of the iceberg. > 5. Stop labeling kern sockets with the kernel's label on the SELinux > side and just label them based on the current task as usual. (This > would probably cause other issues, but maybe not...) > > Any ideas, suggestions, or patches welcome! I think something alike the following could work - not even tested, with comments on bold assumptions. I'll try to do some testing later. --- diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 99f5e51d5ca4..6cad50c6fd24 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -102,6 +102,20 @@ static int __mptcp_socket_create(struct mptcp_sock *msk) if (err) return err; + /* The first subflow can later be indirectly exposed to security + * relevant syscall alike accept() and bind(), and at this point + * carries a 'kern' related security context. + * Reset the security context to the relevant user-space one. + * Note that the following assumes security_socket_post_create() + * being idempotent + */ + err = security_socket_post_create(ssock, sk->sk_family, SOCK_STREAM, + IPPROTO_TCP, 0); + if (err) { + sock_release(ssock); + return err; + } + msk->first = ssock->sk; msk->subflow = ssock; subflow = mptcp_subflow_ctx(ssock->sk);