On Tue, May 19, 2020 at 09:18:47AM -0700, Alexei Starovoitov wrote: > On Mon, May 18, 2020 at 7:53 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > > > > On 2020-05-19, Jann Horn <jannh@xxxxxxxxxx> wrote: > > > On Mon, May 18, 2020 at 11:05 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > > > ## deep argument inspection > > > > > > > > The argument caching bit is, I think, rather mechanical in nature since > > > > it's all "just" internal to the kernel: seccomp can likely adjust how it > > > > allocates seccomp_data (maybe going so far as to have it split across two > > > > pages with the syscall argument struct always starting on the 2nd page > > > > boundary), and copying the EA struct into that page, which will be both > > > > used by the filter and by the syscall. > > > > > > We could also do the same kind of thing the eBPF verifier does in > > > convert_ctx_accesses(), and rewrite the context accesses to actually > > > go through two different pointers depending on the (constant) offset > > > into seccomp_data. > > > > My main worry with this is that we'll need to figure out what kind of > > offset mathematics are necessary to deal with pointers inside the > > extensible struct. As a very ugly proposal, you could make it so that > > you multiply the offset by PAGE_SIZE each time you want to dereference > > the pointer at that offset (unless we want to add new opcodes to cBPF to > > allow us to represent this). > > Please don't. cbpf is frozen. https://www.youtube.com/watch?v=L0MK7qz13bU If the only workable design paths for deep arg inspection end up needing BPF helpers, I would agree that it's time for seccomp to grow eBPF language support. I'm still hoping there's a clean solution that doesn't require a seccomp language extension. > > > We don't need to actually zero-fill memory for this beyond what the > > > kernel supports - AFAIK the existing APIs already say that passing a > > > short length is equivalent to passing zeroes, so we can just replace > > > all out-of-bounds loads with zeroing registers in the filter. > > > The tricky case is what should happen if the userspace program passes > > > in fields that the filter doesn't know about. The filter can see the > > > length field passed in by userspace, and then just reject anything > > > where the length field is bigger than the structure size the filter > > > knows about. But maybe it'd be slightly nicer if there was an > > > operation for "tell me whether everything starting at offset X is > > > zeroes", so that if someone compiles with newer kernel headers where > > > the struct is bigger, and leaves the new fields zeroed, the syscalls > > > still go through an outdated filter properly. > > > > I think the best way of handling this (without breaking programs > > senselessly) is to have filters essentially emulate > > copy_struct_from_user() semantics -- which is along the lines of what > > you've suggested. > > and cpbf load instruction will become copy_from_user() underneath? No, this was meaning internal checking about struct sizes needs to exist (not the user copy parts). > I don't see how that can work. > Have you considered implications to jits, register usage, etc ? > > ebpf will become sleepable soon. It will be able to do copy_from_user() > and examine any level of user pointer dereference. > toctou is still going to be a concern though, > but such ebpf+copy_from_user analysis and syscall sandboxing > will not need to change kernel code base around syscalls at all. > No need to invent E-syscalls and all the rest I've seen in this thread. To avoid the ToCToU, the seccomp infrastructure must do the copy_from_user(), so there's not need for the sleepable stuff in seccomp that I can see. The question is mainly one of flattening. -- Kees Cook _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers