On 19/03/2021 19:57, Kees Cook wrote: > On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote: >> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> >> >> Using Landlock objects and ruleset, it is possible to tag inodes >> according to a process's domain. To enable an unprivileged process to >> express a file hierarchy, it first needs to open a directory (or a file) >> and pass this file descriptor to the kernel through >> landlock_add_rule(2). When checking if a file access request is >> allowed, we walk from the requested dentry to the real root, following >> the different mount layers. The access to each "tagged" inodes are >> collected according to their rule layer level, and ANDed to create >> access to the requested file hierarchy. This makes possible to identify >> a lot of files without tagging every inodes nor modifying the >> filesystem, while still following the view and understanding the user >> has from the filesystem. >> >> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not >> keep the same struct inodes for the same inodes whereas these inodes are >> in use. >> >> This commit adds a minimal set of supported filesystem access-control >> which doesn't enable to restrict all file-related actions. This is the >> result of multiple discussions to minimize the code of Landlock to ease >> review. Thanks to the Landlock design, extending this access-control >> without breaking user space will not be a problem. Moreover, seccomp >> filters can be used to restrict the use of syscall families which may >> not be currently handled by Landlock. >> >> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >> Cc: Anton Ivanov <anton.ivanov@xxxxxxxxxxxxxxxxxx> >> Cc: James Morris <jmorris@xxxxxxxxx> >> Cc: Jann Horn <jannh@xxxxxxxxxx> >> Cc: Jeff Dike <jdike@xxxxxxxxxxx> >> Cc: Kees Cook <keescook@xxxxxxxxxxxx> >> Cc: Richard Weinberger <richard@xxxxxx> >> Cc: Serge E. Hallyn <serge@xxxxxxxxxx> >> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> >> Link: https://lore.kernel.org/r/20210316204252.427806-8-mic@xxxxxxxxxxx >> [...] >> + spin_lock(&sb->s_inode_list_lock); >> + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { >> + struct landlock_object *object; >> + >> + /* Only handles referenced inodes. */ >> + if (!atomic_read(&inode->i_count)) >> + continue; >> + >> + /* >> + * Checks I_FREEING and I_WILL_FREE to protect against a race >> + * condition when release_inode() just called iput(), which >> + * could lead to a NULL dereference of inode->security or a >> + * second call to iput() for the same Landlock object. Also >> + * checks I_NEW because such inode cannot be tied to an object. >> + */ >> + spin_lock(&inode->i_lock); >> + if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { >> + spin_unlock(&inode->i_lock); >> + continue; >> + } > > This (and elsewhere here) seems like a lot of inode internals getting > exposed. Can any of this be repurposed into helpers? I see this test > scattered around the kernel a fair bit: > > $ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l > 9 Dealing with the filesystem is complex. Some helpers could probably be added, but with a series dedicated to the filesystem. I can work on that once this series is merged. > >> +static inline u32 get_mode_access(const umode_t mode) >> +{ >> + switch (mode & S_IFMT) { >> + case S_IFLNK: >> + return LANDLOCK_ACCESS_FS_MAKE_SYM; >> + case 0: >> + /* A zero mode translates to S_IFREG. */ >> + case S_IFREG: >> + return LANDLOCK_ACCESS_FS_MAKE_REG; >> + case S_IFDIR: >> + return LANDLOCK_ACCESS_FS_MAKE_DIR; >> + case S_IFCHR: >> + return LANDLOCK_ACCESS_FS_MAKE_CHAR; >> + case S_IFBLK: >> + return LANDLOCK_ACCESS_FS_MAKE_BLOCK; >> + case S_IFIFO: >> + return LANDLOCK_ACCESS_FS_MAKE_FIFO; >> + case S_IFSOCK: >> + return LANDLOCK_ACCESS_FS_MAKE_SOCK; >> + default: >> + WARN_ON_ONCE(1); >> + return 0; >> + } > > I'm assuming this won't be reachable from userspace. It should not, only a bogus kernel code could. > >> [...] >> index a5d6ef334991..f8e8e980454c 100644 >> --- a/security/landlock/setup.c >> +++ b/security/landlock/setup.c >> @@ -11,17 +11,24 @@ >> >> #include "common.h" >> #include "cred.h" >> +#include "fs.h" >> #include "ptrace.h" >> #include "setup.h" >> >> +bool landlock_initialized __lsm_ro_after_init = false; >> + >> struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { >> .lbs_cred = sizeof(struct landlock_cred_security), >> + .lbs_inode = sizeof(struct landlock_inode_security), >> + .lbs_superblock = sizeof(struct landlock_superblock_security), >> }; >> >> static int __init landlock_init(void) >> { >> landlock_add_cred_hooks(); >> landlock_add_ptrace_hooks(); >> + landlock_add_fs_hooks(); >> + landlock_initialized = true; > > I think this landlock_initialized is logically separate from the optional > DEFINE_LSM "enabled" variable, but I thought I'd double check. :) An LSM can be marked as enabled (at boot) but not yet initialized. > > It seems like it's used here to avoid releasing superblocks before > landlock_init() is called? What is the scenario where that happens? It is a condition for LSM hooks, syscalls and superblock management. > >> pr_info("Up and running.\n"); >> return 0; >> } >> diff --git a/security/landlock/setup.h b/security/landlock/setup.h >> index 9fdbf33fcc33..1daffab1ab4b 100644 >> --- a/security/landlock/setup.h >> +++ b/security/landlock/setup.h >> @@ -11,6 +11,8 @@ >> >> #include <linux/lsm_hooks.h> >> >> +extern bool landlock_initialized; >> + >> extern struct lsm_blob_sizes landlock_blob_sizes; >> >> #endif /* _SECURITY_LANDLOCK_SETUP_H */ >> -- >> 2.30.2 >> > > The locking and inode semantics are pretty complex, but since, again, > it's got significant test and syzkaller coverage, it looks good to me. > > With the inode helper cleanup: > > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> >