This patchset addresses a race condition we've dealt with recently with seccomp. Specifically programs interrupting syscalls while they're in progress. This was exacerbated by Golang's[1] recent adoption of "Non-cooperative goroutine preemption", in which they try to interrupt any syscall that's been running for more than 10ms. During certain syscalls, it's non-trivial to write them in a reetrant manner in userspace (mount). It allows a per-filter flag to be set that makes it so that the notifying process will switch to "TASK_KILLABLE" as opposed to returning to userspace on non-fatal signals. Changes since v3[4]: * Clean up tests * Split out helper function (dedupe code) * Add some explanation about whats going on * Small documentation edit Changes since v2[3]: * Split out addfd patches * Move the flag to be per-filter (as opposed to per notification) Changes since v1[2]: * Fix some documentation * Add Rata's patches to allow for direct return from addfd [1]: https://github.com/golang/proposal/blob/master/design/24543-non-cooperative-preemption.md [2]: https://lore.kernel.org/lkml/20210220090502.7202-1-sargun@xxxxxxxxx/ [3]: https://lore.kernel.org/all/20210426180610.2363-1-sargun@xxxxxxxxx/ [4]: https://lore.kernel.org/lkml/20220429023113.74993-1-sargun@xxxxxxxxx/ Sargun Dhillon (3): seccomp: Add wait_killable semantic to seccomp user notifier selftests/seccomp: Refactor get_proc_stat to split out file reading code selftests/seccomp: Add test for wait killable notifier .../userspace-api/seccomp_filter.rst | 10 + include/linux/seccomp.h | 3 +- include/uapi/linux/seccomp.h | 2 + kernel/seccomp.c | 42 ++- tools/testing/selftests/seccomp/seccomp_bpf.c | 282 +++++++++++++++++- 5 files changed, 320 insertions(+), 19 deletions(-) -- 2.25.1