On Fri, Sep 25, 2020 at 12:56 AM Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> wrote: > Yes, the man page would read something like > > SECCOMP_SET_MODE_FILTER_BITMAP > The system calls allowed are defined by a pointer to a > Berkeley Packet Filter (BPF) passed via args. > This argument is a pointer to a struct sock_fprog_bitmap; > > with that struct containing whatever information/extra pointers needed > for passing the bitmap(s) in addition to the bpf prog. > > And SECCOMP_SET_MODE_FILTER would internally just be updated to work > as-if all-zero allow-bitmaps were passed along. The internal kernel > bitmap would just be the and of the bitmaps in the filter stack. > > Sure, it's UAPI, so would certainly need more careful thought on details > of just how the arg struct looks like etc. etc., but I was wondering why > it hadn't been discussed at all. If SECCOMP_SET_MODE_FILTER is attached before / after SECCOMP_SET_MODE_FILTER_BITMAP, does it mean all bitmap gets void? Would it make sense to have SECCOMP_SET_MODE_FILTER run through the emulator to see if we can construct a bitmap anyways for "legacy no-bitmap" support? Another thing to consider is that in both patch series we only construct one final bitmap that, if the bit is set, seccomp will not call into the BPF filter. If the bit is not set, then all filters are called in sequence, even if some of them "must allow the syscall". With SECCOMP_SET_MODE_FILTER_BITMAP, the filter BPF code will no longer have the "if it's this syscall" for any syscalls that are given in the bitmaps, and calling into these filters will be a false negative. So we would need extra logic to make "does this filter have a bitmap? if so check bitmap first". Probably won't be too complicated, but idk if it is actually worth the complexity. wdyt? > Regardless, I'd like to see some numbers, certainly for the "how much > faster does a getpid() or read() or any of the other syscalls that > nobody disallows" get, but also "what's the cost of doing that emulation > at seccomp(2) time". The former has been given in my RFC patch [1]. In an extreme case of no side channel mitigations, in the same amount of time, unixbench syscall mixed runs 33295685 syscalls without seccomp, 20661056 syscalls with docker profile, 25719937 syscalls with bitmapped docker profile. Though, I think Jack was running on Ubuntu and it did not have a libseccomp shipped with the distro that's new enough to do the binary decision tree generation [2]. I'll try to profile the latter later on my qemu-kvm, with a recent libsecomp with binary tree and docker's profile, probably both direct filter attaches and filter attaches with fork(). I'm guessing if I have fork() the cost of fork() will overshadow seccomp() though. [1] https://lore.kernel.org/containers/cover.1600661418.git.yifeifz2@xxxxxxxxxxxx/ [2] https://github.com/seccomp/libseccomp/pull/152 YiFei Zhu