On Sun, Oct 13, 2024 at 11:25:11AM +0200, Amir Goldstein wrote: > On Fri, Oct 11, 2024 at 8:45 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > > > This is different from faccessat(2) + X_OK which only checks a subset of > > access rights (i.e. inode permission and mount options for regular > > files), but not the full context (e.g. all LSM access checks). The main > > use case for access(2) is for SUID processes to (partially) check access > > on behalf of their caller. The main use case for execveat(2) + AT_CHECK > > is to check if a script execution would be allowed, according to all the > > different restrictions in place. Because the use of AT_CHECK follows > > the exact kernel semantic as for a real execution, user space gets the > > same error codes. > > > > An interesting point of using execveat(2) instead of openat2(2) is that > > it decouples the check from the enforcement. Indeed, the security check > > can be logged (e.g. with audit) without blocking an execution > > environment not yet ready to enforce a strict security policy. > > > > LSMs can control or log execution requests with > > security_bprm_creds_for_exec(). However, to enforce a consistent and > > complete access control (e.g. on binary's dependencies) LSMs should > > restrict file executability, or mesure executed files, with > > security_file_open() by checking file->f_flags & __FMODE_EXEC. > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > make sense for the kernel to parse the checked files, look for > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > if the format is unknown. Because of that, security_bprm_check() is > > never called when AT_CHECK is used. > > > > It should be noted that script interpreters cannot directly use > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > executed to interpret the script. Unlike the kernel, script > > interpreters may just interpret the shebang as a simple comment, which > > should not change for backward compatibility reasons. > > > > Because scripts or libraries files might not currently have the > > executable permission set, or because we might want specific users to be > > allowed to run arbitrary scripts, the following patch provides a dynamic > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and > > SECBIT_EXEC_DENY_INTERACTIVE securebits. > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > This patch has been used for more than a decade with customized script > > interpreters. Some examples can be found here: > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > > Cc: Christian Brauner <brauner@xxxxxxxxxx> > > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > > Cc: Paul Moore <paul@xxxxxxxxxxxxxx> > > Cc: Serge Hallyn <serge@xxxxxxxxxx> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx> > > Link: https://lore.kernel.org/r/20241011184422.977903-2-mic@xxxxxxxxxxx > > --- > > > > Changes since v19: > > * Remove mention of "role transition" as suggested by Andy. > > * Highlight the difference between security_bprm_creds_for_exec() and > > the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as > > discussed with Jeff. > > * Improve documentation both in UAPI comments and kernel comments > > (requested by Kees). > > > > New design since v18: > > https://lore.kernel.org/r/20220104155024.48023-3-mic@xxxxxxxxxxx > > --- > > fs/exec.c | 18 ++++++++++++++++-- > > include/linux/binfmts.h | 7 ++++++- > > include/uapi/linux/fcntl.h | 31 +++++++++++++++++++++++++++++++ > > kernel/audit.h | 1 + > > kernel/auditsc.c | 1 + > > security/security.c | 10 ++++++++++ > > 6 files changed, 65 insertions(+), 3 deletions(-) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 6c53920795c2..163c659d9ae6 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -891,7 +891,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > .lookup_flags = LOOKUP_FOLLOW, > > }; > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > return ERR_PTR(-EINVAL); > > if (flags & AT_SYMLINK_NOFOLLOW) > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > @@ -1545,6 +1545,20 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > } > > bprm->interp = bprm->filename; > > > > + /* > > + * At this point, security_file_open() has already been called (with > > + * __FMODE_EXEC) and access control checks for AT_CHECK will stop just > > + * after the security_bprm_creds_for_exec() call in bprm_execve(). > > + * Indeed, the kernel should not try to parse the content of the file > > + * with exec_binprm() nor change the calling thread, which means that > > + * the following security functions will be not called: > > + * - security_bprm_check() > > + * - security_bprm_creds_from_file() > > + * - security_bprm_committing_creds() > > + * - security_bprm_committed_creds() > > + */ > > + bprm->is_check = !!(flags & AT_CHECK); > > + > > retval = bprm_mm_init(bprm); > > if (!retval) > > return bprm; > > @@ -1839,7 +1853,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > /* Set the unchanging part of bprm->cred */ > > retval = security_bprm_creds_for_exec(bprm); > > - if (retval) > > + if (retval || bprm->is_check) > > goto out; > > > > retval = exec_binprm(bprm); > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > index e6c00e860951..8ff0eb3644a1 100644 > > --- a/include/linux/binfmts.h > > +++ b/include/linux/binfmts.h > > @@ -42,7 +42,12 @@ struct linux_binprm { > > * Set when errors can no longer be returned to the > > * original userspace. > > */ > > - point_of_no_return:1; > > + point_of_no_return:1, > > + /* > > + * Set by user space to check executability according to the > > + * caller's environment. > > + */ > > + is_check:1; > > struct file *executable; /* Executable to pass to the interpreter */ > > struct file *interpreter; > > struct file *file; > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index 87e2dec79fea..e606815b1c5a 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -154,6 +154,37 @@ > > usable with open_by_handle_at(2). */ > > #define AT_HANDLE_MNT_ID_UNIQUE 0x001 /* Return the u64 unique mount ID. */ > > > > +/* > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > + * of this file would be allowed, ignoring the file format and then the related > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). > > + * > > + * Programs should always perform this check to apply kernel-level checks > > + * against files that are not directly executed by the kernel but passed to a > > + * user space interpreter instead. All files that contain executable code, > > + * from the point of view of the interpreter, should be checked. However the > > + * result of this check should only be enforced according to > > + * SECBIT_EXEC_RESTRICT_FILE or SECBIT_EXEC_DENY_INTERACTIVE. See securebits.h > > + * documentation and the samples/check-exec/inc.c example. > > + * > > + * The main purpose of this flag is to improve the security and consistency of > > + * an execution environment to ensure that direct file execution (e.g. > > + * `./script.sh`) and indirect file execution (e.g. `sh script.sh`) lead to the > > + * same result. For instance, this can be used to check if a file is > > + * trustworthy according to the caller's environment. > > + * > > + * In a secure environment, libraries and any executable dependencies should > > + * also be checked. For instance, dynamic linking should make sure that all > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > + * trusted code should be executable, which also requires integrity guarantees. > > + * > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > + * descriptor instead of a path. > > + */ > > If you ask me, the very elaborate comment above belongs to execveat(2) > man page and is way too verbose for a uapi header. OK, but since this new flags raised a lot of questions, I guess a dedicated Documentation/userspace-api/check-exec.rst file with thit AT*_CHECK and the related securebits would be useful instead of the related inlined documentation. > > > +#define AT_CHECK 0x10000 > > Please see the comment "Per-syscall flags for the *at(2) family of syscalls." > above. If this is a per-syscall flag please use one of the per-syscall > flags, e.g.: > > /* Flags for execveat2(2) */ > #define AT_EXECVE_CHECK 0x0001 /* Only perform a check if > execution would be allowed */ I missed this part, this prefix makes sense, thanks. > > > Thanks, > Amir.