On Fri, Feb 17, 2012 at 4:55 PM, Indan Zupancic <indan@xxxxxx> wrote: > Hello, > > On Fri, February 17, 2012 17:23, Will Drewry wrote: >> On Thu, Feb 16, 2012 at 11:08 PM, Indan Zupancic <indan@xxxxxx> wrote: >>>> +/* Indicates if a tracer is attached. */ >>>> +#define SECCOMP_FLAGS_TRACED 0 >>> >>> That's not the best way to check if a tracer is attached, and if you did use >>> it for that, you don't need to toggle it all the time. >> >> It's logically no different than task->ptrace. If it is less >> desirable, that's fine, but it is functionally equivalent. > > Except that when using task->ptrace the ptrace code keeps track of it and > clears it when the ptracer goes away. And you're toggling SECCOMP_FLAGS_TRACED > all the time. Yep, the code is gone in the coming version. It was ugly to need to change it everywhere TIF_SYSCALL_TRACE was toggled. >>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >>>> index c75485c..f9d419f 100644 >>>> --- a/kernel/seccomp.c >>>> +++ b/kernel/seccomp.c >>>> @@ -289,6 +289,8 @@ void copy_seccomp(struct seccomp *child, >>>> { >>>> child->mode = prev->mode; >>>> child->filter = get_seccomp_filter(prev->filter); >>>> + /* Note, this leaves seccomp tracing enabled across fork. */ >>>> + child->flags = prev->flags; >>> >>> What if the child isn't ptraced? >> >> Then falling through with TIF_SYSCALL_TRACE will result in the >> SECCOMP_RET_TRACE events to be allowed, but this comes back to the >> race. If I can effectively "check" that ptrace did its job, then I >> think this becomes a non-issue. > > Yes. But it would be still sloppy state tracking, which can lead to > all kind of unlikely but interesting scenario's. If the child is ever > attached to later on, that flag will be still set. Same is true for > any descendant, they all will have that flag copied. Yup - it'd lead to tracehook fall through and an implicit allow. Not ideal at all. >>>> } >>>> >>>> /** >>>> @@ -363,6 +365,19 @@ int __secure_computing_int(int this_syscall) >>>> syscall_rollback(current, task_pt_regs(current)); >>>> seccomp_send_sigtrap(); >>>> return -1; >>>> + case SECCOMP_RET_TRACE: >>>> + if (!seccomp_traced(¤t->seccomp)) >>>> + return -1; >>>> + /* >>>> + * Delegate to TIF_SYSCALL_TRACE. This allows fast-path >>>> + * seccomp calls to delegate to slow-path if needed. >>>> + * Since TIF_SYSCALL_TRACE will be unset on ptrace(2) >>>> + * continuation, there should be no direct side >>>> + * effects. If TIF_SYSCALL_TRACE is already set, this >>>> + * has no effect. >>>> + */ >>>> + set_tsk_thread_flag(current, TIF_SYSCALL_TRACE); >>>> + /* Falls through to allow. */ >>> >>> This is nice and simple, but not race-free. You want to check if the ptracer >>> handled the event or not. If the ptracer died before handling this then the >>> syscall should be denied and the task should be killed. >> >> Hrm. I think there's a way to do this without forcing seccomp to >> always go slow path. I'll update the patch and see how it goes. > > You only have to go through the slow path for the SECCOMP_RET_TRACE case. You'd have to know at syscall entry time to decide or pre-scan the bpf filter to see if SECCOMP_RET_TRACE is returned non-programmatically, then add a TIF flag for slow pathing, but that seems crazy bad. > But yeah, toggling TIF_SYSCALL_TRACE seems the only way to avoid the slow > path, sometimes. The downside is that it's unexpected behaviour which may > clash with arch entry code, so I'm not sure if that's a good idea. I think > always going through the slow path isn't too bad, compared to the ptrace > alternative it's still a lot faster. Since supporting that behavior is documented as a prerequisite for adding HAVE_ARCH_SECCOMP_FILTER, I don't see how it could be unexpected behavior. On systems, like x86, where seccomp is always slowpath, it has no impact. However, it means that if a fast path is added (like audit), then it will need to know to re-check the threadinfo flags. I don't want to try to optimize in advance, but it'd be nice to not close off any options for later. If an explicit ptrace_event(SECCOMP) call was being made, then we'd be stuck in the slow path or stuck making the ptrace code have more ifs for determining if the source was a normal ptrace event or special seccomp-triggered one. That might be okay as a long-term-thing, though, since the other option (which the incoming patchset does) is to add a post-trace callback into seccomp. I'm not sure which is truly preferable. >>> Many people would like a PTRACE_O_KILL_TRACEE_IF_DEBUGGER_DIES option, >>> Oleg was working on that, among other things. Perhaps re-use that to >>> handle this case too? >> >> Well, if you can inject initial code into the tracee, then it can call >> prctl(PR_SET_PDEATHSIG, SIGKILL). Then when the tracer dies, the >> child dies. > > That only works for child tracees, not descendants of the tracee. True enough. >> If the SIGKILL race in arch_ptrace_... is resolved, then >> a SIGKILL that arrives between seccomp and delegation to ptrace should >> result in process death. Though perhaps my proposal above will make >> seccomp's integration with ptrace less subject to ptrace behaviors. > > Oleg fixed the SIGKILL problem (it wasn't a race), it should go upstream > in the next kernel version, I think. Pick your own name for it then, I guess. The signal lock was held in ptrace_notify. Then, in order to hand off to the arch_ptrace_notify code, it releases the lock, then claims it again after. If SIGKILL was delivered in that time window, then the post-arch-handoff code would see it, skip notification of the tracer, and allow the syscall to run prior to terminating the task. I'm excited to see it fixed :) >>>> case SECCOMP_RET_ALLOW: >>> >>> For this and the ERRNO case you could check that PTRACE_O_SECCOMP option and >>> decide to do something or not in ptrace. >> >> For ERRNO, I'd prefer not to since it adds implicit behavior to the >> rules and, without pulling a ptrace_event()ish call into this code, it >> would change the return flow and potentially open up errno, which >> should be solid, to races, etc. For ALLOW, sure, but at that point, >> just use PTRACE_SYSCALL. Perhaps this can all be ameliorated if I can >> get a useful ptrace_entry completed notification. > > You don't want ptrace to be able to override the decision? Fair enough. > Or did you mean something else? Exactly. The first time I went down this path, I let a tracer pick up any denied syscalls, but that complicated the interactions and security model quite a bit. I also don't want to add an implicit dependency on the syscall slow-path for any other return values -- just in case the proposed TIF_SYSCALL_TRACE approach isn't acceptable. thanks! will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html