On Fri, Oct 9, 2020 at 7:15 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > Currently the kernel does not provide an infrastructure to translate > architecture numbers to a human-readable name. Translating syscall > numbers to syscall names is possible through FTRACE_SYSCALL > infrastructure but it does not provide support for compat syscalls. > > This will create a file for each PID as /proc/pid/seccomp_cache. > The file will be empty when no seccomp filters are loaded, or be > in the format of: > <arch name> <decimal syscall number> <ALLOW | FILTER> > where ALLOW means the cache is guaranteed to allow the syscall, > and filter means the cache will pass the syscall to the BPF filter. > > For the docker default profile on x86_64 it looks like: > x86_64 0 ALLOW > x86_64 1 ALLOW > x86_64 2 ALLOW > x86_64 3 ALLOW > [...] > x86_64 132 ALLOW > x86_64 133 ALLOW > x86_64 134 FILTER > x86_64 135 FILTER > x86_64 136 FILTER > x86_64 137 ALLOW > x86_64 138 ALLOW > x86_64 139 FILTER > x86_64 140 ALLOW > x86_64 141 ALLOW > [...] > > This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default > of N because I think certain users of seccomp might not want the > application to know which syscalls are definitely usable. For > the same reason, it is also guarded by CAP_SYS_ADMIN. > > Suggested-by: Jann Horn <jannh@xxxxxxxxxx> > Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@xxxxxxxxxxxxxx/ > Signed-off-by: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> [...] > diff --git a/arch/Kconfig b/arch/Kconfig [...] > +config SECCOMP_CACHE_DEBUG > + bool "Show seccomp filter cache status in /proc/pid/seccomp_cache" > + depends on SECCOMP > + depends on SECCOMP_FILTER > + depends on PROC_FS > + help > + This is enables /proc/pid/seccomp_cache interface to monitor nit: s/This is enables/This enables the/ > + seccomp cache data. The file format is subject to change. Reading > + the file requires CAP_SYS_ADMIN. > + > + This option is for debugging only. Enabling present the risk that nit: *presents > + an adversary may be able to infer the seccomp filter logic. [...] > +int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns, > + struct pid *pid, struct task_struct *task) > +{ > + struct seccomp_filter *f; > + unsigned long flags; > + > + /* > + * We don't want some sandboxed process know what their seccomp s/know/to know/ > + * filters consist of. > + */ > + if (!file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN)) > + return -EACCES; > + > + if (!lock_task_sighand(task, &flags)) > + return 0; maybe return -ESRCH here so that userspace can distinguish between an exiting process and a process with no filters? > + f = READ_ONCE(task->seccomp.filter); > + if (!f) { > + unlock_task_sighand(task, &flags); > + return 0; > + } [...]