On Wed, May 20, 2020 at 12:04:04PM -0700, Kees Cook wrote: > On Wed, May 20, 2020 at 11:27:03AM -0700, Linus Torvalds wrote: > > Don't make this some kind of abstract conceptual problem thing. > > Because it's not. > > I have no intention of making this abstract (the requests for expanding > seccomp coverage have been for only a select class of syscalls, and > specifically clone3 and openat2) nor more complicated than it needs to be > (I regularly resist expanding the seccomp BPF dialect into eBPF). Kees, since you've forked the thread I'm adding bpf mailing list back and re-iterating my point: ** Nack to cBPF extensions ** How that is relevant? You're proposing to add copy_from_user() to selected syscalls, like clone3, and present large __u32 array to cBPF program. In other words existing fixed sized 'struct seccomp_data' will become either variable length or jumbo fixed size like one page. In the fomer case it would mean that cBPF would need to be extended with variable length logic. Which in turn means it will suffer from spectre v1 issues. We've spent a lot of time fixing spectre v1 issues with eBPF. Including teaching the verifier to recognize speculative patterns inside the programs so that malicious bpf progs trying to exploit spec v1 will be caught at load time. There is no other tool (compiler or static analysis) that can do similar analysis. I suggest that you look into what eBPF is actually doing instead of trying to reinvent the wheel. If you go with latter approach of presenting cBPF with giant 'struct seccomp_data + page' that extra page would need to be zeroed out before invocation of bpf program which will make seccomp even less usable that it is today. Currently it's slow and unusable in production datacenter. People suggested for years to adopt eBPF in seccomp to accelerate it, but, as you confessed, you resisted and sounds like now you want to implement seccomp specific syscall bitmask? Which means more kernel code, more bugs, more security issues. imo that's another reinvented wheel when eBPF can do it already. I don't think it's a good idea to add kernel code when eBPF-based solution exists and capable of examining any level of nested args. > Perhaps the question is "how deeply does seccomp need to inspect?" > and maybe it does not get to see anything beyond just the "top level" > struct (i.e. struct clone_args) and all pointers within THAT become > opaque? That certainly simplifies the design. clone3's 'struct clone_args' has set_tid pointer as a second level. I don't think that sticking to first level of pointers for this particular syscall will make seccomp filtering any more practical.