Re: [PATCH v2 seccomp 2/6] asm/syscall.h: Add syscall_arches[] array

Kees Cook <keescook@xxxxxxxxxxxx> · Thu, 24 Sep 2020 20:09:07 -0700

On Thu, Sep 24, 2020 at 08:27:40PM -0500, YiFei Zhu wrote:
> [resending this too]
> 
> On Thu, Sep 24, 2020 at 6:01 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > Disregarding the "how" of this, yeah, we'll certainly need something to
> > tell seccomp about the arrangement of syscall tables and how to find
> > them.
> >
> > However, I'd still prefer to do this on a per-arch basis, and include
> > more detail, as I've got in my v1.
> >
> > Something missing from both styles, though, is a consolidation of
> > values, where the AUDIT_ARCH* isn't reused in both the seccomp info and
> > the syscall_get_arch() return. The problems here were two-fold:
> >
> > 1) putting this in syscall.h meant you do not have full NR_syscall*
> >    visibility on some architectures (e.g. arm64 plays weird games with
> >    header include order).
> 
> I don't get this one -- I'm not playing with NR_syscall here.

Right, sorry, I may not have been clear. When building my RFC I noticed
that I couldn't use NR_syscall very "early" in the header file include
stack on arm64, which complicated things. So I guess what I mean is
something like "it's probably better to do all these seccomp-specific
macros/etc in asm/include/seccomp.h rather than in syscall.h because I
know at least one architecture that might cause trouble."

> > 2) seccomp needs to handle "multiplexed" tables like x86_x32 (distros
> >    haven't removed CONFIG_X86_X32 widely yet, so it is a reality that
> >    it must be dealt with), which means seccomp's idea of the arch
> >    "number" can't be the same as the AUDIT_ARCH.
> 
> Why so? Does anyone actually use x32 in a container? The memory cost
> and analysis cost is on everyone. The worst case scenario if we don't
> support it is that the syscall is not accelerated.

Ironicailly, that's the only place I actually know for sure where people
using x32 because it shows measurable (10%) speed-up for builders:
https://lore.kernel.org/lkml/CAOesGMgu1i3p7XMZuCEtj63T-ST_jh+BfaHy-K6LhgqNriKHAA@xxxxxxxxxxxxxx

So, yes, as you and Jann both point out, it wouldn't be terrible to just
ignore x32, it seems a shame to penalize it. That said, if the masking
step from my v1 is actually noticable on a native workload, then yeah,
probably x32 should be ignored. My instinct (not measured) is that it's
faster than walking a small array.[citation needed]

> > So, likely a combo of approaches is needed: an array (or more likely,
> > enum), declared in the per-arch seccomp.h file. And I don't see a way
> > to solve #1 cleanly.
> >
> > Regardless, it needs to be split per architecture so that regressions
> > can be bisected/reverted/isolated cleanly. And if we can't actually test
> > it at runtime (or find someone who can) it's not a good idea to make the
> > change. :)
> 
> You have a good point regarding tests. Don't see how it affects
> regressions though. Only one file here is ever included per-build.

It's easier to do a per-arch revert (i.e. all the -stable tree
machinery, etc) with a single SHA instead of having to write a partial
revert, etc.

-- 
Kees Cook