On Fri, Jun 13, 2014 at 2:42 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Fri, Jun 13, 2014 at 2:37 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote: >> On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote: >>>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>> This adds the new "seccomp" syscall with both an "operation" and "flags" >>>>> parameter for future expansion. The third argument is a pointer value, >>>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must >>>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). >>>>> >>>>> Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> >>>>> Cc: linux-api@xxxxxxxxxxxxxxx >>>>> --- >>>>> arch/x86/syscalls/syscall_32.tbl | 1 + >>>>> arch/x86/syscalls/syscall_64.tbl | 1 + >>>>> include/linux/syscalls.h | 2 ++ >>>>> include/uapi/asm-generic/unistd.h | 4 ++- >>>>> include/uapi/linux/seccomp.h | 4 +++ >>>>> kernel/seccomp.c | 63 ++++++++++++++++++++++++++++++++----- >>>>> kernel/sys_ni.c | 3 ++ >>>>> 7 files changed, 69 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl >>>>> index d6b867921612..7527eac24122 100644 >>>>> --- a/arch/x86/syscalls/syscall_32.tbl >>>>> +++ b/arch/x86/syscalls/syscall_32.tbl >>>>> @@ -360,3 +360,4 @@ >>>>> 351 i386 sched_setattr sys_sched_setattr >>>>> 352 i386 sched_getattr sys_sched_getattr >>>>> 353 i386 renameat2 sys_renameat2 >>>>> +354 i386 seccomp sys_seccomp >>>>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl >>>>> index ec255a1646d2..16272a6c12b7 100644 >>>>> --- a/arch/x86/syscalls/syscall_64.tbl >>>>> +++ b/arch/x86/syscalls/syscall_64.tbl >>>>> @@ -323,6 +323,7 @@ >>>>> 314 common sched_setattr sys_sched_setattr >>>>> 315 common sched_getattr sys_sched_getattr >>>>> 316 common renameat2 sys_renameat2 >>>>> +317 common seccomp sys_seccomp >>>>> >>>>> # >>>>> # x32-specific system call numbers start at 512 to avoid cache impact >>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h >>>>> index b0881a0ed322..1713977ee26f 100644 >>>>> --- a/include/linux/syscalls.h >>>>> +++ b/include/linux/syscalls.h >>>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid, >>>>> asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type, >>>>> unsigned long idx1, unsigned long idx2); >>>>> asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags); >>>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, >>>>> + const char __user *uargs); >>>> >>>> It looks odd to add 'flags' argument to syscall that is not even used. >>>> It don't think it will be extensible this way. >>>> 'uargs' is used only in 2nd command as well and it's not 'char __user *' >>>> but rather 'struct sock_fprog __user *' >>>> I think it makes more sense to define only first argument as 'int op' and the >>>> rest as variable length array. >>>> Something like: >>>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len); >>>> then different commands can interpret 'attrs' differently. >>>> if op == mode_strict, then attrs == NULL, len == 0 >>>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter >>>> and nla_data(attrs) is 'struct sock_fprog' >>> >>> Eww. If the operation doesn't imply the type, then I think we've >>> totally screwed up. >>> >>>> If we decide to add new types of filters or new commands, the syscall prototype >>>> won't need to change. New commands can be added preserving backward >>>> compatibility. >>>> The basic TLV concept has been around forever in netlink world. imo makes >>>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls >>>> is the thing >>>> of the past. TLV style is more extensible. Fields of structures can become >>>> optional in the future, new fields added, etc. >>>> 'struct nlattr' brings the same benefits to kernel api as protobuf did >>>> to user land. >>> >>> I see no reason to bring nl_attr into this. >>> >>> Admittedly, I've never dealt with nl_attr, but everything >>> netlink-related I've even been involved in has involved some sort of >>> API atrocity. >> >> netlink has a lot of legacy and there is genetlink which is not pretty >> either because of extra socket creation, binding, dealing with packet >> loss issues, but the key concept of variable length encoding is sound. >> Right now seccomp has two commands and they already don't fit >> into single syscall neatly. Are you saying there should be two syscalls >> here? What about another seccomp related command? Another syscall? >> imo all seccomp related commands needs to be mux/demux-ed under >> one syscall. What is the way to mux/demux potentially very different >> commands under one syscall? I cannot think of anything better than >> TLV style. 'struct nlattr' is what we have today and I think it works fine. >> I'm not suggesting to bring the whole netlink into the picture, but rather >> TLV style of encoding different arguments for different commands. > > I'm unconvinced. These are simple commands, and I think the interface > should be simple. Syscalls are cheap. > > As an example, the interface could be: > > int seccomp_add_filter(const struct sock_fprog *filter, unsigned int flags); > > The "tsync" operation would be seccomp_add_filter(NULL, > SECCOMP_ADD_FILTER_TSYNC) -- it's equivalent to adding an > always-accept filter and syncing threads. I think you convinced me that tsync should be part of adding a filter (since now there are no failure side-effects), so this specific example I would expect EFAULT from. But ... > > But, frankly, this kind of stuff should probably be "do operation X". > IIUC nl_attr is more like "do something, with these tags and values", > which results in oddities like whatever should happen of more than one > tag is set. I have no objection to eliminating seccomp-strict from the syscall, and just making this the "add seccomp filter" syscall. My only hesitation would be that if we need something besides adding a filter in the future, we'd be back to extending this awkwardly or adding another syscall. That's why I went with the "operation" argument. I'm not opposed to passing attrs and len, but seccomp_add_filter does feel cleaner. -Kees -- Kees Cook Chrome OS Security