On Thu, Sep 24, 2020 at 08:27:40PM -0500, YiFei Zhu wrote: > [resending this too] > > On Thu, Sep 24, 2020 at 6:01 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > Disregarding the "how" of this, yeah, we'll certainly need something to > > tell seccomp about the arrangement of syscall tables and how to find > > them. > > > > However, I'd still prefer to do this on a per-arch basis, and include > > more detail, as I've got in my v1. > > > > Something missing from both styles, though, is a consolidation of > > values, where the AUDIT_ARCH* isn't reused in both the seccomp info and > > the syscall_get_arch() return. The problems here were two-fold: > > > > 1) putting this in syscall.h meant you do not have full NR_syscall* > > visibility on some architectures (e.g. arm64 plays weird games with > > header include order). > > I don't get this one -- I'm not playing with NR_syscall here. Right, sorry, I may not have been clear. When building my RFC I noticed that I couldn't use NR_syscall very "early" in the header file include stack on arm64, which complicated things. So I guess what I mean is something like "it's probably better to do all these seccomp-specific macros/etc in asm/include/seccomp.h rather than in syscall.h because I know at least one architecture that might cause trouble." > > 2) seccomp needs to handle "multiplexed" tables like x86_x32 (distros > > haven't removed CONFIG_X86_X32 widely yet, so it is a reality that > > it must be dealt with), which means seccomp's idea of the arch > > "number" can't be the same as the AUDIT_ARCH. > > Why so? Does anyone actually use x32 in a container? The memory cost > and analysis cost is on everyone. The worst case scenario if we don't > support it is that the syscall is not accelerated. Ironicailly, that's the only place I actually know for sure where people using x32 because it shows measurable (10%) speed-up for builders: https://lore.kernel.org/lkml/CAOesGMgu1i3p7XMZuCEtj63T-ST_jh+BfaHy-K6LhgqNriKHAA@xxxxxxxxxxxxxx So, yes, as you and Jann both point out, it wouldn't be terrible to just ignore x32, it seems a shame to penalize it. That said, if the masking step from my v1 is actually noticable on a native workload, then yeah, probably x32 should be ignored. My instinct (not measured) is that it's faster than walking a small array.[citation needed] > > So, likely a combo of approaches is needed: an array (or more likely, > > enum), declared in the per-arch seccomp.h file. And I don't see a way > > to solve #1 cleanly. > > > > Regardless, it needs to be split per architecture so that regressions > > can be bisected/reverted/isolated cleanly. And if we can't actually test > > it at runtime (or find someone who can) it's not a good idea to make the > > change. :) > > You have a good point regarding tests. Don't see how it affects > regressions though. Only one file here is ever included per-build. It's easier to do a per-arch revert (i.e. all the -stable tree machinery, etc) with a single SHA instead of having to write a partial revert, etc. -- Kees Cook