On Thu, Sep 24, 2020 at 07:44:21AM -0500, YiFei Zhu wrote: > From: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> > > Currently the kernel does not provide an infrastructure to translate > architecture numbers to a human-readable name. Translating syscall > numbers to syscall names is possible through FTRACE_SYSCALL > infrastructure but it does not provide support for compat syscalls. > > This will create a file for each PID as /proc/pid/seccomp_cache. > The file will be empty when no seccomp filters are loaded, or be > in the format of: > <hex arch number> <decimal syscall number> <ALLOW | FILTER> > where ALLOW means the cache is guaranteed to allow the syscall, > and filter means the cache will pass the syscall to the BPF filter. > > For the docker default profile on x86_64 it looks like: > c000003e 0 ALLOW > c000003e 1 ALLOW > c000003e 2 ALLOW > c000003e 3 ALLOW > [...] > c000003e 132 ALLOW > c000003e 133 ALLOW > c000003e 134 FILTER > c000003e 135 FILTER > c000003e 136 FILTER > c000003e 137 ALLOW > c000003e 138 ALLOW > c000003e 139 FILTER > c000003e 140 ALLOW > c000003e 141 ALLOW > [...] > > This file is guarded by CONFIG_PROC_SECCOMP_CACHE with a default > of N because I think certain users of seecomp might not want the > application to know which syscalls are definitely usable. > > I'm not sure if adding all the "human readable names" is worthwhile, > considering it can be easily done in userspace. The question of permissions is my central concern here: who should see this? Some contained processes have been intentionally blocked from self-introspection so even the "standard" high bar of "ptrace attach allowed?" can't always be sufficient. My compromise about filter visibility in the past was saying that CAP_SYS_ADMIN was required (see seccomp_get_filter()). I'm nervous to weaken this. (There is some work that hasn't been sent upstream yet that is looking to expose the filter _contents_ via /proc that has been nervous too.) Now full contents vs "allow"/"filter" are certainly different things, but I don't feel like I've got enough evidence to show that this introspection would help debugging enough to justify the partially imagined safety of not exposing it to potential attackers. I suspect it _is_ the right thing to do (just look at my own RFC's "debug" patch), but I'd like this to be well justified in the commit log. And yes, while it does hide behind a CONFIG, I'd still want it justified, especially since distros have a tendency to just turn everything on anyway. ;) > + for (arch = 0; arch < ARRAY_SIZE(syscall_arches); arch++) { > + for (nr = 0; nr < NR_syscalls; nr++) { > + bool cached = test_bit(nr, f->cache.syscall_ok[arch]); > + char *status = cached ? "ALLOW" : "FILTER"; > + > + seq_printf(m, "%08x %d %s\n", syscall_arches[arch], > + nr, status > + ); > + } > + } But behavior-wise, yeah, I like it; I'm fine with human-readable and full AUDIT_ARCH values. (Though, as devil's advocate again, to repeat Jann's own words back: do we want to add this only to have a new UAPI to support going forward?) -- Kees Cook