On 28/02/2017 21:01, Andy Lutomirski wrote: > On Tue, Feb 21, 2017 at 5:26 PM, Mickaël Salaün <mic@xxxxxxxxxxx> wrote: >> The seccomp(2) syscall can be use to apply a Landlock rule to the >> current process. As with a seccomp filter, the Landlock rule is enforced >> for all its future children. An inherited rule tree can be updated >> (append-only) by the owner of inherited Landlock nodes (e.g. a parent >> process that create a new rule) > > Can you clarify exaclty what this type of update does? Is it > something that should be supported by normal seccomp rules as well? There is two main structures involved here: struct landlock_node and struct landlock_rule, both defined in include/linux/landlock.h [02/10]. Let's take an example with seccomp filter and then Landlock: * seccomp filter: Process P1 creates and applies a seccomp filter F1 to itself. Then it forks and creates a child P2, which inherits P1's filters, hence F1. Now, if P1 add a new seccomp filter F2 to itself, P2 *won't get it*. The P2's filter list will still only contains F1 but not F2. If P2 sets up and applies a new filter F3 to itself, its filter list will contains F1 and F3. * Landlock: Process P1 creates and applies a Landlock rule R1 to itself. Underneath the kernel creates a new node N1 dedicated to P1, which contains all its rules. Then P1 forks and creates a child P2, which inherits P1's rules, hence R1. Underneath P2 inherited N1. Now, if P1 add a new Landlock rule R2 to itself, P2 *will get it* as well (because R2 is part of N1). If P2 creates and applies a new rule R3 to itself, its rules will contains R1, R2 and R3. Underneath the kernel created a new node N2 for P2, which only contains R3 but inherits/links to N1. This design makes it possible for a process to add more constraints to its children on the fly. I think it is a good feature to have and a safer default inheritance mechanism, but it could be guarded by an option flag if we want both mechanism to be available. The same design could be used by seccomp filter too. > >> +/** >> + * landlock_run_prog - run Landlock program for a syscall > > Unless this is actually specific to syscalls, s/for a syscall//, perhaps? Right, not specific to syscall anymore. > >> + if (new_events->nodes[event_idx]->owner == >> + &new_events->nodes[event_idx]) { >> + /* We are the owner, we can then update the node. */ >> + add_landlock_rule(new_events, rule); > > This is the part I don't get. Adding a rule if you're the owner (BTW, > why is ownership visible to userspace at all?) for just yourself and > future children is very different from adding it so it applies to > preexisting children too. Node ownership is not (directly) visible to userspace. The current inheritance mechanism doesn't enable to only add a rule to the current process. The rule will be inherited by its children (starting from the children created after the first applied rule). An option flag NEW_RULE_HIERARCHY (or maybe another seccomp operation) could enable to create a new node for the current process, and then makes it not inherited by the previous children. > > >> + } else if (atomic_read(¤t_events->usage) == 1) { >> + WARN_ON(new_events->nodes[event_idx]->owner); >> + /* >> + * We can become the new owner if no other task use it. >> + * This avoid an unnecessary allocation. >> + */ >> + new_events->nodes[event_idx]->owner = >> + &new_events->nodes[event_idx]; >> + add_landlock_rule(new_events, rule); >> + } else { >> + /* >> + * We are not the owner, we need to fork current_events >> + * and then add a new node. >> + */ >> + struct landlock_node *node; >> + size_t i; >> + >> + node = kmalloc(sizeof(*node), GFP_KERNEL); >> + if (!node) { >> + new_events = ERR_PTR(-ENOMEM); >> + goto put_rule; >> + } >> + atomic_set(&node->usage, 1); >> + /* set the previous node after the new_events >> + * allocation */ >> + node->prev = NULL; >> + /* do not increment the previous node usage */ >> + node->owner = &new_events->nodes[event_idx]; >> + /* rule->prev is already NULL */ >> + atomic_set(&rule->usage, 1); >> + node->rule = rule; >> + >> + new_events = new_raw_landlock_events(); >> + if (IS_ERR(new_events)) { >> + /* put the rule as well */ >> + put_landlock_node(node); >> + return ERR_PTR(-ENOMEM); >> + } >> + for (i = 0; i < ARRAY_SIZE(new_events->nodes); i++) { >> + new_events->nodes[i] = >> + lockless_dereference( >> + current_events->nodes[i]); >> + if (i == event_idx) >> + node->prev = new_events->nodes[i]; >> + if (!WARN_ON(!new_events->nodes[i])) >> + atomic_inc(&new_events->nodes[i]->usage); >> + } >> + new_events->nodes[event_idx] = node; >> + >> + /* >> + * @current_events will not be freed here because it's usage >> + * field is > 1. It is only prevented to be freed by another >> + * subject thanks to the caller of landlock_append_prog() which >> + * should be locked if needed. >> + */ >> + put_landlock_events(current_events); >> + } >> + } >> + return new_events; >> + >> +put_prog: >> + bpf_prog_put(prog); >> + return new_events; >> + >> +put_rule: >> + put_landlock_rule(rule); >> + return new_events; >> +} >> + >> +/** >> + * landlock_seccomp_append_prog - attach a Landlock rule to the current process >> + * >> + * current->seccomp.landlock_events is lazily allocated. When a process fork, >> + * only a pointer is copied. When a new event is added by a process, if there >> + * is other references to this process' landlock_events, then a new allocation >> + * is made to contains an array pointing to Landlock rule lists. This design >> + * has low-performance impact and is memory efficient while keeping the >> + * property of append-only rules. >> + * >> + * @flags: not used for now, but could be used for TSYNC >> + * @user_bpf_fd: file descriptor pointing to a loaded Landlock rule >> + */ >> +#ifdef CONFIG_SECCOMP_FILTER >> +int landlock_seccomp_append_prog(unsigned int flags, const char __user *user_bpf_fd) >> +{ >> + struct landlock_events *new_events; >> + struct bpf_prog *prog; >> + int bpf_fd; >> + >> + /* force no_new_privs to limit privilege escalation */ >> + if (!task_no_new_privs(current)) >> + return -EPERM; >> + /* will be removed in the future to allow unprivileged tasks */ >> + if (!capable(CAP_SYS_ADMIN)) >> + return -EPERM; >> + if (!user_bpf_fd) >> + return -EFAULT; >> + if (flags) >> + return -EINVAL; >> + if (copy_from_user(&bpf_fd, user_bpf_fd, sizeof(bpf_fd))) >> + return -EFAULT; >> + prog = bpf_prog_get(bpf_fd); >> + if (IS_ERR(prog)) >> + return PTR_ERR(prog); >> + >> + /* >> + * We don't need to lock anything for the current process hierarchy, >> + * everything is guarded by the atomic counters. >> + */ >> + new_events = landlock_append_prog(current->seccomp.landlock_events, prog); > > Do you need to check that it's the right *kind* of bpf prog or is that > handled elsewhere? The program type is checked at the beginning of landlock_append_prog(). Mickaël
Attachment:
signature.asc
Description: OpenPGP digital signature