On Mon, Aug 7, 2017 at 6:23 PM, Tyler Hicks <tyhicks@xxxxxxxxxxxxx> wrote: > On 08/02/2017 10:19 PM, Kees Cook wrote: >> Right now, SECCOMP_RET_KILL kills the current thread. There have been >> a few requests for RET_KILL to kill the entire process (the thread >> group), but since seccomp's u32 return values are ABI, and ordered by >> lowest value, with RET_KILL as 0, there isn't a trivial way to provide >> an even smaller value that would mean the more restrictive action of >> killing the thread group. >> >> Instead, create a filter flag that indicates that a RET_KILL from this >> filter must kill the process rather than the thread. This can be set >> (and not cleared) via the new SECCOMP_FILTER_FLAG_KILL_PROCESS flag. >> >> Pros: >> - the logic for the filter action is contained in the filter. >> - userspace can detect support for the feature since earlier kernels >> will reject the new flag. >> Cons: >> - depends on adding an assignment to the seccomp_run_filters() loop >> (previous patch). >> >> Alternatives to this approach with pros/cons: >> >> - Use a new test during seccomp_run_filters() that treats the RET_DATA >> mask of a RET_KILL action as special. If a new bit is set in the data, >> then treat the return value as -1 (lower than 0). >> Pros: >> - the logic for the filter action is contained in the filter. >> Cons: >> - added complexity to time-sensitive seccomp_run_filters() loop. >> - there isn't a trivial way for userspace to detect if the kernel >> supports the feature (earlier kernels will silently ignore the >> RET_DATA and only kill the thread). > > I prefer using a filter flag over a special RET_DATA mask but, for > completeness, I wanted to mention that SECCOMP_GET_ACTION_AVAIL > operation could be extended to validate special RET_DATA masks. However, > I don't think that is a clean design. > >> - Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct >> rather than the filter. >> Pros: >> - no change needed to seccomp_run_filters() loop. >> Cons: >> - the change in behavior technically originates external to the >> filter, which allows for later filters to "enhance" a previously >> applied filter's RET_KILL to kill the entire process, which may >> be unexpected. >> >> Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> >> --- >> include/linux/seccomp.h | 3 ++- >> include/uapi/linux/seccomp.h | 3 ++- >> kernel/seccomp.c | 12 +++++++++++- >> 3 files changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h >> index ecc296c137cd..59d001ba655c 100644 >> --- a/include/linux/seccomp.h >> +++ b/include/linux/seccomp.h >> @@ -3,7 +3,8 @@ >> >> #include <uapi/linux/seccomp.h> >> >> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) >> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ >> + SECCOMP_FILTER_FLAG_KILL_PROCESS) >> >> #ifdef CONFIG_SECCOMP >> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h >> index 0f238a43ff1e..4b75d8c297b6 100644 >> --- a/include/uapi/linux/seccomp.h >> +++ b/include/uapi/linux/seccomp.h >> @@ -15,7 +15,8 @@ >> #define SECCOMP_SET_MODE_FILTER 1 >> >> /* Valid flags for SECCOMP_SET_MODE_FILTER */ >> -#define SECCOMP_FILTER_FLAG_TSYNC 1 >> +#define SECCOMP_FILTER_FLAG_TSYNC 1 >> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2 >> >> /* >> * All BPF programs must return a 32-bit value. >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> index 8bdcf01379e4..931eb9cbd093 100644 >> --- a/kernel/seccomp.c >> +++ b/kernel/seccomp.c >> @@ -44,6 +44,7 @@ >> * is only needed for handling filters shared across tasks. >> * @prev: points to a previously installed, or inherited, filter >> * @prog: the BPF program to evaluate >> + * @kill_process: if true, RET_KILL will kill process rather than thread. >> * >> * seccomp_filter objects are organized in a tree linked via the @prev >> * pointer. For any task, it appears to be a singly-linked list starting >> @@ -59,6 +60,7 @@ struct seccomp_filter { >> refcount_t usage; >> struct seccomp_filter *prev; >> struct bpf_prog *prog; >> + bool kill_process; > > Just a reminder to move bool up to be the 2nd member of the struct for > improved struct packing. (You already said you were going to this while > you were reviewing my logging patches) Thanks! Yeah, done now. >> }; >> >> /* Limit any path through the tree to 256KB worth of instructions. */ >> @@ -448,6 +450,10 @@ static long seccomp_attach_filter(unsigned int flags, >> return ret; >> } >> >> + /* Set process-killing flag, if present. */ >> + if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS) >> + filter->kill_process = true; >> + >> /* >> * If there is an existing filter, make it the prev and don't drop its >> * task reference. >> @@ -658,7 +664,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, >> seccomp_init_siginfo(&info, this_syscall, data); >> do_coredump(&info); >> } >> - do_exit(SIGSYS); >> + /* Kill entire thread group if requested (or went haywire). */ >> + if (!match || match->kill_process) > > I found this conditional to be a little surprising. I would have expected: > > if (match && match->kill_process) > do_group_exit(SIGSYS); > else > do_exit(SIGSYS); > > A resulting NULL match pointer is unexpected but callers of > seccomp_run_filters() are expected to handle such a situation so it is > possible. Are you preferring to fail closed as much as possible > (considering that killing the process is more restrictive than killing > the thread)? Yeah, that was my thinking. The only way for this to happen is if we have a totally impossible state: NULL filter in struct seccomp but seccomp mode is non-zero. I'll add a comment above this. > I don't know the specific situations that would result in the match > pointer not being set in seccomp_run_filters() but I'm curious if > existing, old userspace code that only expected the thread to be killed > could potentially tickle such a situation and result in the entire > thread group being unexpectedly killed. There should be no way for userspace to reach this condition. It shouldn't ever be possible to set the seccomp mode without a filter attached. > FWIW, I like the way the conditional is written and am only thinking out > loud about the possible side effects. Yup, very appreciated! > Side note: I initially expected to see > Documentation/userspace-api/seccomp_filter.rst being updated in this > patch and then remembered that seccomp(2) isn't documented in that file > and, therefore, it isn't trivial to add blurbs about filter flags in > there. We should fix that after the dust settles on these two patch sets. Hm, excellent point. Agreed, let's fix that as a separate step. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html