On Tue, Jun 16, 2020 at 12:49 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > Hi, > > In order to build this mapping at filter attach time, each filter is > executed for every syscall (under each possible architecture), and > checked for any accesses of struct seccomp_data that are not the "arch" > nor "nr" (syscall) members. If only "arch" and "nr" are examined, then > there is a constant mapping for that syscall, and bitmaps can be updated > accordingly. If any accesses happen outside of those struct members, > seccomp must not bypass filter execution for that syscall, since program > state will be used to determine filter action result. > > During syscall action probing, in order to determine whether other members > of struct seccomp_data are being accessed during a filter execution, > the struct is placed across a page boundary with the "arch" and "nr" > members in the first page, and everything else in the second page. The > "page accessed" flag is cleared in the second page's PTE, and the filter > is run. If the "page accessed" flag appears as set after running the > filter, we can determine that the filter looked beyond the "arch" and > "nr" members, and exclude that syscall from the constant action bitmaps. This is... evil. I don't know how I feel about it. It's also potentially quite slow. I don't suppose you could, instead, instrument the BPF code to get at this without TLB hackery? Or maybe try to do some real symbolic execution of the BPF code? --Andy