On Mon, Sep 21, 2020 at 12:49 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote: > > On Sun, Sep 20, 2020 at 10:35 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > > > Long-term, do you believe static analysis will be viable? I think that it is > the "ideal" solution here, but I agree in that it is more complex. > > Is there a way to "prime" filters, by giving them a syscall #, and if it has > a terminal condition without inspecting args, it turns into a bitmask entry > viable? I think in theory one could follow the execution of the filter, and if the filter is determined to return a pass for a given syscall number under all circumstances, we record that syscall. We can then replace the bitmap_zero call in seccomp_cache_check with a call to bitmap_copy from the pre-primed bitmap. However, I don't know how much benefit this would provide. One ugly part of the current situation is that the kernel has absolutely no idea what arch numbers returned by syscall_get_arch may be possible for the machine it is running on. For example, for an x86_64 machine with IA32 emulation, the arch number can be either AUDIT_ARCH_I386 or AUDIT_ARCH_X86_64. The seccomp filter will typically have parts handling both cases. As a result, an uncertainty for one syscall on one arch will affect the syscall under the same number for the other arch. If a syscall number is not guaranteed to be allowed under both arches, it won't be primed. Given that usually a seccomp filter is a list of allowed syscalls, my guess is that there won't be many syscalls numbers that will fall under this case; though, I have not tested this. We could add an array of possible arch numbers so that the emulator can refine its tracing. This is probably the best in effort, though, seccomp_cache_prepare now has to iterate through all combinations of syscall numbers and arch numbers. Given that seccomp_cache_prepare should be relatively cold it's probably not too much of a trouble. Alternatively, we could employ constraint tracking, but that sounds overly complex for what we are trying to do. The other question would be, would pre-priming the cache be worth the effort? The assumption is that the vast majority of cacheable syscalls will be permitted. For them, only the first time a particular syscall is invoked would experience the overhead of calling the filter, which means that this part of the initial run we are going to optimize out by pre-priming is going to be relatively cold. wdyt? YiFei Zhu _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers