On Wed, Feb 14, 2018 at 3:29 PM, Tycho Andersen <tycho@xxxxxxxx> wrote: > Hey Kees, > > Thanks for taking a look! > > On Tue, Feb 13, 2018 at 01:09:20PM -0800, Kees Cook wrote: >> On Sun, Feb 4, 2018 at 2:49 AM, Tycho Andersen <tycho@xxxxxxxx> wrote: >> > This patch introduces a means for syscalls matched in seccomp to notify >> > some other task that a particular filter has been triggered. >> > >> > The motivation for this is primarily for use with containers. For example, >> > if a container does an init_module(), we obviously don't want to load this >> > untrusted code, which may be compiled for the wrong version of the kernel >> > anyway. Instead, we could parse the module image, figure out which module >> > the container is trying to load and load it on the host. >> > >> > As another example, containers cannot mknod(), since this checks >> > capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or >> > /dev/zero should be ok for containers to mknod, but we'd like to avoid hard >> > coding some whitelist in the kernel. Another example is mount(), which has >> > many security restrictions for good reason, but configuration or runtime >> > knowledge could potentially be used to relax these restrictions. >> >> Related to the eBPF seccomp thread, can the logic for these things be >> handled entirely by eBPF? My assumption is that you still need to stop >> the process to do something (i.e. do a mknod, or a mount) before >> letting it continue. Is there some "wait for notification" system in >> eBPF? > > I replied in the other thread > (https://patchwork.ozlabs.org/cover/872938/#1856642 for those > following along at home), but no, at least not that I know of. eBPF can call functions. One of those functions could put the caller to sleep. In fact, I think I once proposed doing this for the seccomp logging action as well. >> I wonder if this communication should be netlink, which gives a more >> well-structured way to describe what's on the wire? The reason I ask >> is because if we ever change the seccomp_data structure, we'll now >> have two places where we need to deal with it (the first being within >> the BPF itself). My initial idea was to prefix the communication with >> a size field, then send the structure, and then I had nightmares, and >> realized this was basically netlink reinvented. > > I suggested netlink in LA, and everyone (especially Andy) groaned very > loudly :). I'm happy to switch it to netlink if you like, although i > think memcpy() of structs should be safe here, since the return value > from read or write can indicate the size of things. I could easily get on board with "netlink" (i.e. NLA) messages sent over an fd. I will object strongly to the use of netlink *sockets*. > >> An ERRNO filter would block a USER_NOTIF because it's unconditional. >> TRACE could be either, USER_NOTIF could be either. >> >> This means TRACE rules would be bumped by a USER_NOTIF... hmm. > > Yes, I didn't exactly know what to do here. ERRNO, TRAP, and KILL all > seemed more important than USER_NOTIF, but TRACE didn't. I don't have > a strong opinion about what to do here, because users can adjust their > filters accordingly. Let me know what you prefer. If we switched to eBPF functions, this whole issue goes away. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers