On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun@xxxxxxxxx> wrote:
This patchset enables seccomp filters to be written in eBPF. Although, this patchset doesn't introduce much of the functionality enabled by eBPF, it lays the ground work for it. It also introduces the capability to dump eBPF filters via the PTRACE API in order to make it so that CHECKPOINT_RESTORE will be satisifed. In the attached samples, there's an example of this. One can then use BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program, and use that at reload time. The primary reason for not adding maps support in this patchset is to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS. If we have a map that the BPF program can read, it can potentially "change" privileges after running. It seems like doing writes only is safe, because it can be pure, and side effect free, and therefore not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come to an agreement, this can be in a follow-up patchset.
Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp userspace mailing list just last week: https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74 The kernel changes I proposed are in this email: https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ In that email thread, Kees requested that I try out a binary tree in cBPF and evaluate its performance. I just got a rough prototype working, and while not as fast as an eBPF hash map, the cBPF binary tree was a significant improvement over the linear list of ifs that are currently generated. Also, it only required changing a single function within the libseccomp libary itself. https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b Here are the results I am currently seeing using an in-house customer's seccomp filter and a simplistic test program that runs getppid() thousands of times. Test Case minimum TSC ticks to make syscall ---------------------------------------------------------------- seccomp disabled 620 getppid() at the front of 306-syscall seccomp filter 722 getppid() in middle of 306-syscall seccomp filter 1392 getppid() at the end of the 306-syscall filter 2452 seccomp using a 306-syscall-sized EBPF hash map 800 cBPF filter using a binary tree 922 Thanks. Tom _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers