Re: [RFC PATCH 5/9] Define user structure for events and responses.

Tingmao Wang <m@xxxxxxxxxx> · Mon, 10 Mar 2025 00:39:08 +0000

On 3/8/25 19:07, Mickaël Salaün wrote:
On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
On 3/4/25 19:49, Mickaël Salaün wrote:
On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
[...]
+	/**
+	 * @cookie: Opaque identifier to be included in the response.
+	 */
+	__u32 cookie;

I guess we could use a __u64 index counter per layer instead.  That
would also help to order requests if they are treated by different
supervisor threads.

I don't immediately see a use for ordering requests (if we get more than one
event at once, they are coming from different threads anyway so there can't
be any dependencies between them, and the supervisor threads can use
timestamps), but I think making it a __u64 is probably a good idea
regardless, as it means we don't have to do some sort of ID allocation, and
can just increment an atomic.

Indeed, we should follow the seccomp unotify approach with a random u64
incremented per request.

Do you mean a random starting value, incremented by one per request, or 
something like the landlock_id in the audit patch (random increments too)?

+};
+
+struct landlock_supervise_event {
+	struct landlock_supervise_event_hdr hdr;
+	__u64 access_request;
+	__kernel_pid_t accessor;
+	union {
+		struct {
+			/**
+			 * @fd1: An open file descriptor for the file (open,
+			 * delete, execute, link, readdir, rename, truncate),
+			 * or the parent directory (for create operations
+			 * targeting its child) being accessed.  Must be
+			 * closed by the reader.
+			 *
+			 * If this points to a parent directory, @destname
+			 * will contain the target filename. If @destname is
+			 * empty, this points to the target file.
+			 */
+			int fd1;
+			/**
+			 * @fd2: For link or rename requests, a second file
+			 * descriptor for the target parent directory.  Must
+			 * be closed by the reader.  @destname contains the
+			 * destination filename.  This field is -1 if not
+			 * used.
+			 */
+			int fd2;

Can we just use one FD but identify the requested access instead and
send one event for each, like for the audit patch series?

I haven't managed to read or test out the audit patch yet (I will do), but I
think having the ability to specifically tell whether the child is trying to
move / rename / create a hard link of an existing file, and what it's trying
to use as destination, might be useful (either for security, or purely for
UX)?

For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
/tmp/innocent-tmp-file then read the latter. The supervisor can warn the
user on the initial link attempt, and the shenanigan will probably be
stopped there (although still, being able to say "[program] wants to link
~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
up allowed, later on for the read request it could say something like

	[program] wants to read /tmp/innocent-tmp-file
	    (previously moved from ~/.ssh/id_ecdsa)

Maybe this is a bit silly, but there might be other use cases for knowing
the exact details of a rename/link request, either for at-the-time decision
making, or tracking stuff for future requests?

This pattern looks like datagram packets.  I think we should use the
netlink attributes.  There were concern about using a netlink socket for
the seccomp unotification though:
https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@xxxxxxxxxxxxxx/

There are two main differences with seccomp unotify:
- the supervisor should be able to receive arbitrary-sized data (e.g.
   file name, not path);
- the supervisor should be able to receive file descriptors (instead of
   path).

Sockets are created with socket(2) whereas in our case we should only
get a supervisor FD (indirectly) through landlock_restrict_self(2),
which clearly identifies a kernel object.  Another issue would be to
deal with network namespaces, probably by creating a private one.
Sockets are powerful but we don't needs all the routing complexity.
Moreover, we should only need a blocking communication channel to avoid
issues managing in-flight object references (transformed to FDs when
received).  That makes me think that a socket might not be the right
construct, but we can still rely on the NLA macros to define a proper
protocol with dynamically-sized events, received and send with dedicated
IOCTL commands.

Netlink already provides a way to send a cookie, and
netlink_attribute_type defines the types we'll need, including string.

For instance, a link request/event could include 3 packets, one for each
of these properties:
1. the source file FD;
2. the destination directory FD;
3. the destination filename string.

This way we would avoid the union defined in this patch.

I had no idea about netlink - I will take a look.  Do you know if there 
is any existing code which uses it in a similar way (i.e. not creating 
an actual socket, but using netlink messages)?

I think in the end seccomp-unotify went with an ioctl with a custom 
struct seccomp_notif due to friction with the NL API [1] - do you think 
we will face the same problem here? (I will take a deeper look at 
netlink after sending this.)

(Tycho - could you weigh in?)

[1]: 
https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@xxxxxxxxxxxxxx/

There is still the question about receiving FDs though. It would be nice
to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
sure how this could be properly handled wrt NLA.

Also, if we go with netlink messages, why do we need additional IOCTLs? 
Can we open the fd when we write out the message? (Maybe I will end up 
realizing the reason for this after reading netlink code, but I would )

I will try out the audit patch to see how things like these appears in the
log before commenting further on this. Maybe there is a way to achieve this
while still simplifying the event structure?

+			/**
+			 * @destname: A filename for a file creation target.
+			 *
+			 * If either of fd1 or fd2 points to a parent
+			 * directory rather than the target file, this is the
+			 * NULL-terminated name of the file that will be
+			 * newly created.
+			 *
+			 * Counting the NULL terminator, this field will
+			 * contain one or more NULL padding at the end so
+			 * that the length of the whole struct
+			 * landlock_supervise_event is a multiple of 8 bytes.
+			 *
+			 * This is a variable length member, and the length
+			 * including the terminating NULL(s) can be derived
+			 * from hdr.length - offsetof(struct
+			 * landlock_supervise_event, destname).
+			 */
+			char destname[];

I'd prefer to avoid sending file names for now.  I don't think it's
necessary, and that could encourage supervisors to filter access
according to names.

This is also motivated by the potential UX I'm thinking of. For example, if
a newly installed application tries to create ~/.app-name, it will be much
more reassuring and convenient to the user if we can show something like

	[program] wants to mkdir ~/.app-name. Allow this and future
	access to the new directory?

rather than just "[program] wants to mkdir under ~". (The "Allow this and
future access to the new directory" bit is made possible by the supervisor
knowing the name of the file/directory being created, and can remember them
/ write them out to a persistent profile etc)

Note that this is just the filename under the dir represented by fd - this
isn't a path or anything that can be subject to symlink-related attacks,
etc.  If a program calls e.g.
mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
my understanding is that fd1 will point to /some/dir, and destname would be
"stuff"

Right, this file name information would be useful.  In the case of
audit, the goal is to efficiently and asynchronously log security events
(and align with other LSM logs and related limitations), not primarily
to debug sandboxed apps nor to enrich this information for decision
making, but the supervisor feature would help here.  The patch message
should include this rationale.

Will do

Actually, in case your question is "why not send a fd to represent the newly
created file, instead of sending the name" -- I'm not sure whether you can
open even an O_PATH fd to a non-existent file.

That would not be possible because it would not exist yet, a file name
(not file path) is OK for this case.

+		};
+		struct {
+			__u16 port;
+		};
+	};
+};
+

[...]