Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd

Tingmao Wang <m@xxxxxxxxxx> · Mon, 10 Mar 2025 00:41:28 +0000

On 3/5/25 16:09, Mickaël Salaün wrote:
On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
We allow the user to pass in an additional flag to landlock_create_ruleset
which will make the ruleset operate in "supervise" mode, with a supervisor
attached. We create additional space in the landlock_ruleset_attr
structure to pass the newly created supervisor fd back to user-space.

The intention, while not implemented yet, is that the user-space will read
events from this fd and write responses back to it.

Note: need to investigate if fd clone on fork() is handled correctly, but
should be fine if it shares the struct file. We might also want to let the
user customize the flags on this fd, so that they can request no
O_CLOEXEC.

NOTE: despite this patch having a new uapi, I'm still very open to e.g.
re-using fanotify stuff instead (if that makes sense in the end). This is
just a PoC.

The main security risk of this feature is for this FD to leak and be
used by a sandboxed process to bypass all its restrictions.  This should
be highlighted in the UAPI documentation.


Signed-off-by: Tingmao Wang <m@xxxxxxxxxx>
---
  include/uapi/linux/landlock.h |  10 ++++
  security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
  2 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index e1d2c27533b4..7bc1eb4859fb 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
  	 * resources (e.g. IPCs).
  	 */
  	__u64 scoped;
+	/**
+	 * @supervisor_fd: Placeholder to store the supervisor file
+	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
+	 */
+	__s32 supervisor_fd;

This interface would require the ruleset_attr becoming updatable by the
kernel, which might be OK in theory but requires current syscall wrapper
signature update, see sandboxer.c change.  It also creates a FD which
might not be useful (e.g. if an error occurs before the actual
enforcement).

I see a few alternatives.  We could just use/extend the ruleset FD
instead of creating a new one, but because leaking current rulesets is
not currently a security risk, we should be careful to not change that.

Another approach, similar to seccomp unotify, is to get a
"[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
reference to the newly created domain, which is more specific than the
ruleset used to created this domain (and that can be used to create
other domains).  This domain FD could be used for introspection (i.e.
to get read-only properties such as domain ID), but being able to
directly supervise the referenced domain only with this FD would be a
risk that we should limit.

What we can do is to implement an IOCTL command for such domain FD that
would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
flag was also set).  The key point is to check (one time) that the
process calling this IOCTL is not restricted by the related domain (see
the scope helpers).

Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) 
introspection patch? (thinking about when will someone pass that only 
and not LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)

By the way, is it alright to conceptually relate the supervisor to a 
domain? It really would be a layer inside a domain - the domain could 
have earlier or later layers which can deny access without supervision, 
or the supervisor for earlier layers can deny access first. Therefore 
having supervisor fd coming out of the ruleset felt sensible to me at first.

Also, isn't "check that process calling this IOCTL is not restricted by 
the related domain" and the fact that the IOCTL is on the domain fd, 
which is a return value of landlock_restrict_self, kind of 
contradictory?  I mean it is a sensible check, but that kind of 
highlights that this interface is slightly awkward - basically all 
callers are forced to have a setup where the child sends the domain fd 
back to the parent.


Relying on IOCTL commands (for all these FD types) instead of read/write
operations should also limit the risk of these FDs being misused through
a confused deputy attack (because such IOCTL command would convey an
explicit intent):
https://docs.kernel.org/security/credentials.html#open-file-credentials
https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@xxxxxxxxxxxxxx/
We should get inspiration from seccomp unotify for this too:
https://lore.kernel.org/all/20181209182414.30862-1-tycho@xxxxxxxx/

I think in the seccomp unotify case the problem arises from what the 
setuid binary thinks is just normal data getting interpreted by the 
kernel as a fd, and thus having different effect if the attacker writes 
it vs. if the suid app writes it.  In our case I *think* we should be 
alright, but maybe we should go with ioctl anyway... However, how does 
using netlink messages (a suggestion from a different thread) affect 
this (if we do end up using it)?  Would we have to do netlink msgs via 
IOCTL?


+	/**
+	 * @pad: Unused, must be zero.
+	 */
+	__u32 pad;

In this case we should pack the struct instead.

  };
  
  /*
@@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
   */
  /* clang-format off */
  #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
+#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
  /* clang-format on */
  
  /**

[...]