Re: [RFC PATCH seccomp 0/2] seccomp: Add bitmap cache of arg-independent filter results that allow syscalls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 21, 2020 at 12:49 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
>
> On Sun, Sep 20, 2020 at 10:35 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote:
> >
> Long-term, do you believe static analysis will be viable? I think that it is
> the "ideal" solution here, but I agree in that it is more complex.
>
> Is there a way to "prime" filters, by giving them a syscall #, and if it has
> a terminal condition without inspecting args, it turns into a bitmask entry
> viable?

I think in theory one could follow the execution of the filter, and if
the filter is determined to return a pass for a given syscall number
under all circumstances, we record that syscall. We can then replace
the bitmap_zero call in seccomp_cache_check with a call to bitmap_copy
from the pre-primed bitmap. However, I don't know how much benefit
this would provide.

One ugly part of the current situation is that the kernel has
absolutely no idea what arch numbers returned by syscall_get_arch may
be possible for the machine it is running on. For example, for an
x86_64 machine with IA32 emulation, the arch number can be either
AUDIT_ARCH_I386 or AUDIT_ARCH_X86_64. The seccomp filter will
typically have parts handling both cases. As a result, an uncertainty
for one syscall on one arch will affect the syscall under the same
number for the other arch. If a syscall number is not guaranteed to be
allowed under both arches, it won't be primed. Given that usually a
seccomp filter is a list of allowed syscalls, my guess is that there
won't be many syscalls numbers that will fall under this case; though,
I have not tested this.

We could add an array of possible arch numbers so that the emulator
can refine its tracing. This is probably the best in effort, though,
seccomp_cache_prepare now has to iterate through all combinations of
syscall numbers and arch numbers. Given that seccomp_cache_prepare
should be relatively cold it's probably not too much of a trouble.
Alternatively, we could employ constraint tracking, but that sounds
overly complex for what we are trying to do.

The other question would be, would pre-priming the cache be worth the
effort? The assumption is that the vast majority of cacheable syscalls
will be permitted. For them, only the first time a particular syscall
is invoked would experience the overhead of calling the filter, which
means that this part of the initial run we are going to optimize out
by pre-priming is going to be relatively cold. wdyt?

YiFei Zhu
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers



[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux