Hi all, As discovered by our QE, there is a problem with how the (userspace-facing) sockets returned by accept(2) are labeled when using MPTCP. Currently they always end up with the label representing the kernel (typically system_u:system_r:kernel_t:s0), white they should inherit the context from the parent socket (the one that is passed to accept(2)). A minimal reproducer on a Fedora/CentOS/RHEL system: # Install dependencies dnf install -y mptcpd nginx curl # Disable rules that silence some SELinux denials semodule -DB # Set up a dummy file to be served by nginx echo test > /usr/share/nginx/html/testfile chmod +r /usr/share/nginx/html/testfile # Set up nginx to use MPTCP sysctl -w net.mptcp.enabled=1 systemctl stop nginx mptcpize enable nginx systemctl start nginx # This will fail (no reply from server) mptcpize run curl -k -o /dev/null http://127.0.0.1/testfile # This will show the SELinux denial that caused the failure ausearch -i -m avc | grep httpd It is also possible to trigger the issue by running the selinux-testsuite [1] under `mptcpize run` (it will fail on the inet_socket test in multiple places). Based on what I could infer from the net & mptcp code, this is roughly how it happens (may be inaccurate or incorrect - the maze of the networking stack is not easy to navigate for me): 1. When the server starts, the main mptcp socket is created: socket(2) -> ... -> socket_create() -> inet_create() -> mptcp_init_sock() -> __mptcp_socket_create() 2. __mptcp_socket_create() calls mptcp_subflow_create_socket(), which creates another "kern" socket, which represents the initial(?) subflow. 3. This subflow socket goes through security_socket_post_create() -> selinux_socket_post_create(), which gives it a kernel label based on kern == 1, which indicates that it's a kernel-internal socket. 4. The main socket goes through its own selinux_socket_post_create(), which gives it the label based on the current task. 5. Later, when the client connection is accepted via accept(2) on the main socket, an underlying accept operation is performed on the subflow socket, which is then returned directly as the result of the accept(2) syscall. 6. Since this socket is cloned from the subflow socket, it inherits the kernel label from the original subflow socket (via selinux_inet_conn_request() and selinux_inet_csk_clone()). selinux_sock_graft() then also copies the label onto the inode representing the socket. 7. When nginx later calls writev(2) on the new socket, selinux_file_permission() uses the inode label as the target in a tcp_socket::write permission check. This is denied, as in the Fedora policy httpd_t isn't allowed to write to kernel_t TCP sockets. Side note: There is currently an odd conditional in sock_has_perm() in security/selinux/hooks.c that skips SELinux permission checking for sockets that have the kernel label, so native socket operations (such as recv(2), send(2), recvmsg(2), ...) will not uncover this problem, only generic file operations such as read(2), write(2), writev(2), etc. I believe that check shouldn't be there, but that's for another discussion... So now the big question is: How to fix this? I can think of several possible solutions, but neither of them seems to be the obvious correct one: 1. Wrap the socket cloned from the subflow socket in another socket (similar to how the main socket + subflow(s) are handled), which would be cloned from the non-kern outer socket that has the right label. This could have the disadvantage of adding unnecessary overhead, but would probably be simpler to do. 2. Somehow ensure that the cloned socket gets the label from the main socket instead of the subflow socket. This would probably require adding a new LSM hook and I'm not sure at all what would be the best way to implement this. 3. Somehow communicate the subflow socket <-> main socket relationship to the LSM layer so that it can switch to use the label of the main socket when handling an operation on a subflow socket (thus copying the label correctly on accept(2)). Not a great solution, as it requires each LSM that labels sockets to duplicate the indirection logic. 4. Do not create the subflow sockets as "kern". (Not sure if that would be desirable.) 5. Stop labeling kern sockets with the kernel's label on the SELinux side and just label them based on the current task as usual. (This would probably cause other issues, but maybe not...) Any ideas, suggestions, or patches welcome! [1] https://github.com/SELinuxProject/selinux-testsuite/ -- Ondrej Mosnacek Senior Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.