Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction

Mikhail Ivanov <ivanov.mikhail1@xxxxxxxxxxxxxxxxxxx> · Fri, 13 Dec 2024 21:19:10 +0300

On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
Hi Mikhail and Landlock maintainers,

+cc MPTCP list.

Thanks, we should include this list in the next series.

On 17/10/2024 13:04, Mikhail Ivanov wrote:
Do not check TCP access right if socket protocol is not IPPROTO_TCP.
LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
should not restrict bind(2) and connect(2) for non-TCP protocols
(SCTP, MPTCP, SMC).

Thank you for the patch!

I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
see TCP packets with extra TCP options. On Linux, there is indeed a
dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
because we needed such dedicated socket to talk to the userspace.

I don't know Landlock well, but I think it is important to know that an
MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
will do a fallback to "plain" TCP if MPTCP is not supported by the other
peer or by a middlebox. It means that with this patch, if TCP is blocked
by Landlock, someone can simply force an application to create an MPTCP
socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
certainly work, even when connecting to a peer not supporting MPTCP.

Please note that I'm not against this modification -- especially here
when we remove restrictions around MPTCP sockets :) -- I'm just saying
it might be less confusing for users if MPTCP is considered as being
part of TCP. A bit similar to what someone would do with a firewall: if
TCP is blocked, MPTCP is blocked as well.

Good point!  I don't know well MPTCP but I think you're right.  Given
it's close relationship with TCP and the fallback mechanism, it would
make sense for users to not make a difference and it would avoid bypass
of misleading restrictions.  Moreover the Landlock rules are simple and
only control TCP ports, not peer addresses, which seems to be the main
evolution of MPTCP. >

I understand that a future goal might probably be to have dedicated
restrictions for MPTCP and the other stream protocols (and/or for all
stream protocols like it was before this patch), but in the meantime, it
might be less confusing considering MPTCP as being part of TCP (I'm not
sure about the other stream protocols).

We need to take a closer look at the other stream protocols indeed.
Hello! Sorry for the late reply, I was on a small business trip.

Thanks a lot for this catch, without doubt MPTCP should be controlled
with TCP access rights.

In that case, we should reconsider current semantics of TCP control.

Currently, it looks like this:
* LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
* LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
   remote port.

According to these definitions only TCP sockets should be restricted and
this is already provided by Landlock (considering observing commit)
(assuming that "TCP socket" := user space socket of IPPROTO_TCP
protocol).

AFAICS the two objectives of TCP access rights are to control
(1) which ports can be used for sending or receiving TCP packets
     (including SYN, ACK or other service packets).
(2) which ports can be used to establish TCP connection (performed by
     kernel network stack on server or client side).

In most cases denying (2) cause denying (1). Sending or receiving TCP
packets without initial 3-way handshake is only possible on RAW [1] or
PACKET [2] sockets. Usage of such sockets requires root privilligies, so
there is no point to control them with Landlock.

I agree.

Therefore Landlock should only take care about case (2). For now
(please correct me if I'm wrong), we only considered control of
connection performed on user space plain TCP sockets (created with
IPPROTO_TCP).

Correct. Landlock is dedicated to sandbox user space processes and the
related access rights should focus on restricting what is possible
through syscalls (mainly).

TCP kernel sockets are generally used in the following ways:
* in a couple of other user space protocols (MPTCP, SMC, RDS)
* in a few network filesystems (e.g. NFS communication over TCP)

For the second case TCP connection is currently not restricted by
Landlock. This approach is may be correct, since NFS should not have
access to a plain TCP communication and TCP restriction of NFS may
be too implicit. Nevertheless, I think that restriction via current
access rights should be considered.

I'm not sure what you mean here.  I'm not familiar with NFS in the
kernel.  AFAIK there is no socket type for NFS.

NFS client makes RPC requests to perform remote file operations on the
NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
the transport layer.

Call trace of creating TCP socket for client->server communication:
	nfs_create_rpc_client()
	rpc_create()
	xprt_create_transport()
	xs_setup_tcp()
	xs_tcp_setup_socket()
	xs_create_sock()

And RPC request is forwarded to TCP stack by calling
	xs_tcp_send_request().

For the first case, each protocol use TCP differently, so they should
be considered separately.

Yes, for user-accessible protocols.

In the case of MPTCP TCP internal sockets are used to establish
connection and exchange data between two network interfaces. MPTCP
allows to have multiple TCP connections between two MPTCP sockets by
connecting different network interfaces (e.g. WIFI and 3G).

Shared Memory Communication is a protocol that allows TCP applications
transparently use RDMA for communication [3]. TCP internal socket is
used to exchange service CLC messages when establishing SMC connection
(which seems harmless for sandboxing) and for communication in the case
of fallback. Fallback happens only if RDMA communication became
impossible (e.g. if RDMA capable RNIC card went down on host or peer
side). So, preventing TCP communication may be achieved by controlling
fallback mechanism.

Reliable Datagram Socket is connectionless protocol implemented by
Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
connection and use it to deliever splitted message.

In comparison with previous protocols, RDS sockets cannot be binded or
connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
port is assigned to receiving side and sending side is binded to the
port allocated by the kernel (by using zero as port number).

It may be useful to restrict RDS-over-TCP with current access rights,
since it allows to perform TCP communication from user-space. But it
would be only possible to fully allow or deny sending/receiving
(since used ports are not controlled from user space).

Thanks for these explanations.  The ability to fine-control specific
protocol operations (e.g. connect, bind) can be useful for widely used
protocol such as TCP and UDP (or if someone wants to implement it for
another protocol), but this approach would not scale with all protocols
because of their own semantic and the development efforts.  The Landlock
access rights should be explicit, and we should also be able to deny
access to a whole set of protocols.  This should be partially possible
with your socket creation patch series.  I guess the remaining cases
would be to cover transformation of one socket type to another.  I think
we could control such transformation by building on top of the socket
creation control foundation: instead of controlling socket creation, add
a new access right to control socket transformation.  What do you think?

I agree that implementing fine-control network access rights for other
protocols only to be able to completely restrict TCP operations seems
excessive.

Do you mean the implementation of 2 access rights: for creating and
transforming sockets?

If so, there are only 2 socket protocols that can be transformed to TCP
(in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
a TCP socket can be used implicitly to deliver an RDS datagram. Let's
assume that the process of configuring TCP as a transport for RDS is
also included in the socket transformation control.

Socket creation control is sufficient to restrict the implicit use of a
TCP connection. Theoretically, separate socket transformation
control is only required if the user wants to use (for example) SMC
sockets with restricted (partially or completely) TCP bind(2) and
connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
communication in case of fallback. I think they are unlikely to have any
TCP restrictions.

However, control of fallback to TCP by applying socket creation rules
is too implicit and inconvenient.

Initially, I thought that users could expect TCP access rights to
completely restrict the corresponding TCP actions without additional
rules for sockets. I have concerns that socket transformation control
would not be explicit enough for such purpose.

Probably, it will be more correctly to apply rules that deny creation of
SMC, MPTCP and RDS sockets (or their transformation to TCP) in
landlock_restrict_self() if TCP actions are not fully allowed?

Restricting any TCP connection in the kernel is probably simplest
design, but we should consider above cases to provide the most useful
one.

[1] https://man7.org/linux/man-pages/man7/raw.7.html
[2] https://man7.org/linux/man-pages/man7/packet.7.html
[3] https://datatracker.ietf.org/doc/html/rfc7609
[4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html

sk_is_tcp() is used for this to check address family of the socket
before doing INET-specific address length validation. This is required
for error consistency.

Closes: https://github.com/landlock-lsm/linux/issues/40
Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")

I don't know how fixes are considered in Landlock, but should this patch
be considered as a fix? It might be surprising for someone who thought
all "stream" connections were blocked to have them unblocked when
updating to a minor kernel version, no?

Indeed.  The main issue was with the semantic/definition of
LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
code with the documentation, one way or the other, preferably following
the principle of least astonishment.

(Personally, I would understand such behaviour change when upgrading to
a major version, and still, maybe only if there were alternatives to

This "fix" needs to be backported, but we're not clear yet on what it
should be. :)

continue having the same behaviour, e.g. a way to restrict all stream
sockets the same way, or something per stream socket. But that's just me
:) )

The documentation and the initial idea was to control TCP bind and
connect.  The kernel implementation does more than that, so we need to
synthronize somehow.

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.