On Wed, Sep 30, 2020 at 5:20 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > Currently the kernel does not provide an infrastructure to translate > architecture numbers to a human-readable name. Translating syscall > numbers to syscall names is possible through FTRACE_SYSCALL > infrastructure but it does not provide support for compat syscalls. > > This will create a file for each PID as /proc/pid/seccomp_cache. > The file will be empty when no seccomp filters are loaded, or be > in the format of: > <arch name> <decimal syscall number> <ALLOW | FILTER> > where ALLOW means the cache is guaranteed to allow the syscall, > and filter means the cache will pass the syscall to the BPF filter. > > For the docker default profile on x86_64 it looks like: > x86_64 0 ALLOW > x86_64 1 ALLOW > x86_64 2 ALLOW > x86_64 3 ALLOW > [...] > x86_64 132 ALLOW > x86_64 133 ALLOW > x86_64 134 FILTER > x86_64 135 FILTER > x86_64 136 FILTER > x86_64 137 ALLOW > x86_64 138 ALLOW > x86_64 139 FILTER > x86_64 140 ALLOW > x86_64 141 ALLOW > [...] Oooh, neat! :) Thanks! > Suggested-by: Jann Horn <jannh@xxxxxxxxxx> > Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@xxxxxxxxxxxxxx/ > Signed-off-by: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> > --- > arch/Kconfig | 15 +++++++++++ > arch/x86/include/asm/seccomp.h | 3 +++ > fs/proc/base.c | 3 +++ > include/linux/seccomp.h | 5 ++++ > kernel/seccomp.c | 46 ++++++++++++++++++++++++++++++++++ > 5 files changed, 72 insertions(+) > > diff --git a/arch/Kconfig b/arch/Kconfig > index ca867b2a5d71..b840cadcc882 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -478,6 +478,7 @@ config HAVE_ARCH_SECCOMP_CACHE_NR_ONLY > - all the requirements for HAVE_ARCH_SECCOMP_FILTER > - SECCOMP_ARCH_DEFAULT > - SECCOMP_ARCH_DEFAULT_NR > + - SECCOMP_ARCH_DEFAULT_NAME > > config SECCOMP > prompt "Enable seccomp to safely execute untrusted bytecode" > @@ -532,6 +533,20 @@ config SECCOMP_CACHE_NR_ONLY > > endchoice > > +config DEBUG_SECCOMP_CACHE > + bool "Show seccomp filter cache status in /proc/pid/seccomp_cache" > + depends on SECCOMP_CACHE_NR_ONLY > + depends on PROC_FS > + help > + This is enables /proc/pid/seccomp_cache interface to monitor nit: s/is enables/enables/ > + seccomp cache data. The file format is subject to change. Reading > + the file requires CAP_SYS_ADMIN. > + > + This option is for debugging only. Enabling present the risk that > + an adversary may be able to infer the seccomp filter logic. > + > + If unsure, say N. > + [...] > diff --git a/kernel/seccomp.c b/kernel/seccomp.c [...] > +int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns, > + struct pid *pid, struct task_struct *task) > +{ > + struct seccomp_filter *f; > + > + /* > + * We don't want some sandboxed process know what their seccomp > + * filters consist of. > + */ > + if (!file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN)) > + return -EACCES; > + > + f = READ_ONCE(task->seccomp.filter); > + if (!f) > + return 0; Hmm, this won't work, because the task could be exiting, and seccomp filters are detached in release_task() (using seccomp_filter_release()). And at the moment, seccomp_filter_release() just locklessly NULLs out the tsk->seccomp.filter pointer and drops the reference. The locking here is kind of gross, but basically I think you can change this code to use lock_task_sighand() / unlock_task_sighand() (see the other examples in fs/proc/base.c), and bail out if lock_task_sighand() returns NULL. And in seccomp_filter_release(), add something like this: /* We are effectively holding the siglock by not having any sighand. */ WARN_ON(tsk->sighand != NULL); > +#ifdef SECCOMP_ARCH_DEFAULT > + proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_DEFAULT_NAME, > + f->cache.syscall_allow_default, > + SECCOMP_ARCH_DEFAULT_NR); > +#endif /* SECCOMP_ARCH_DEFAULT */ > + > +#ifdef SECCOMP_ARCH_COMPAT > + proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_COMPAT_NAME, > + f->cache.syscall_allow_compat, > + SECCOMP_ARCH_COMPAT_NR); > +#endif /* SECCOMP_ARCH_COMPAT */ > + return 0; > +} > +#endif /* CONFIG_DEBUG_SECCOMP_CACHE */ > -- > 2.28.0 >