On Thu, Jan 12, 2012 at 11:57 AM, Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote: > Will Drewry wrote: >> On Thu, Jan 12, 2012 at 11:22 AM, Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote: >> > Will Drewry wrote: >> >> On Thu, Jan 12, 2012 at 9:43 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: >> >> > On Wed, 2012-01-11 at 11:25 -0600, Will Drewry wrote: >> >> > >> >> >> Filter programs may _only_ cross the execve(2) barrier if last filter >> >> >> program was attached by a task with CAP_SYS_ADMIN capabilities in its >> >> >> user namespace. Once a task-local filter program is attached from a >> >> >> process without privileges, execve will fail. This ensures that only >> >> >> privileged parent task can affect its privileged children (e.g., setuid >> >> >> binary). >> >> > >> >> > This means that a non privileged user can not run another program with >> >> > limited features? How would a process exec another program and filter >> >> > it? I would assume that the filter would need to be attached first and >> >> > then the execv() would be performed. But after the filter is attached, >> >> > the execv is prevented? >> >> >> >> Yeah - it means tasks can filter themselves, but not each other. >> >> However, you can inject a filter for any dynamically linked executable >> >> using LD_PRELOAD. >> >> >> >> > Maybe I don't understand this correctly. >> >> >> >> You're right on. This was to ensure that one process didn't cause >> >> crazy behavior in another. I think Alan has a better proposal than >> >> mine below. (Goes back to catching up.) >> > >> > You can already use ptrace() to cause crazy behaviour in another >> > process, including modifying registers arbitrarily at syscall entry >> > and exit, aborting and emulating syscalls. >> > >> > ptrace() is quite slow and it would be really nice to speed it up, >> > especially for trapping a small subset of syscalls, or limiting some >> > kinds of access to some file descriptors, while everything else runs >> > at normal speed. >> > >> > Speeding up ptrace() with BPF filters would be a really nice. Not >> > that I like ptrace(), but sometimes it's the only thing you can rely on. >> > >> > LD_PRELOAD and code running in the target process address space can't >> > always be trusted in some contexts (e.g. the target process may modify >> > the tracing code or its data); whereas ptrace() is pretty complete and >> > reliable, if ugly. >> > >> > There's already a security model around who can use ptrace(); speeding >> > it up needn't break that. >> > >> > If we'd had BPF ptrace in the first place, SECCOMP wouldn't have been >> > needed as userspace could have done it, with exactly the restrictions >> > it wants. Google's NaCl comes to mind as a potential user. >> >> That's not entirely true. ptrace supervisors are subject to races and >> always fail open. This makes them effective but not as robust as a >> seccomp solution can provide. > > What races do you know about? I'm pretty sure that if you have two "isolated" processes, they could cause irregular behavior using signals. > I'm not aware of any ptrace races if it's used properly. I'm also not > sure what you mean by fail open/close here, unless you mean the target > process gets to carry on if the tracing process dies. Exactly. Security systems that, on failure, allow the action to proceed can't be relied on. > Having said that, I can think of one race, but I think your BPF scheme > has the same one: After checking the syscall's string arguments and > other pointed to data, another thread can change those arguments > before the real syscall uses them. Not a problem - BPF only allows register inspection. No TOCTOU attacks need apply :D >> With seccomp, it fails close. What I think would make sense would be >> to add a user-controllable failure mode with seccomp bpf that calls >> tracehook_ptrace_syscall_entry(regs). I've prototyped this and it >> works quite well, but I didn't want to conflate the discussions. > > It think it's a nice idea. While you're at it could you fix all the > architectures to actually use tracehooks for syscall tracing ;-) > > (I think it's ok to call the tracehook function on all archs though.) > >> Using ptrace() would also mean that all consumers of this interface >> would need a supervisor, but with seccomp, the filters are installed >> and require no supervisors to stick around for when failure occurs. >> >> Does that make sense? > > It does, I agree that ptrace() is quite cumbersome and you don't > always want a separate tracing process, especially if "failure" means > to die or get an error. > > On the other hand, sometimes when a failure occurs, having another > process decide what to do, or log the event, is exactly what you want. > > For my nefarious purposes I'm really just looking for a faster way to > reliably trace some activities of individual processes, in particular > tracking which files they access. I'd rather not interfere with > debuggers, so I'd really like your ability to stack multiple filters > to work with separate-process tracing as well. And I'd happily use a > filter rule which can dump some information over a pipe, without > waiting for the tracer to respond in most cases. Cool - if the rest of this discussion proceeds, then hopefully, we can move towards discussing if tying it with ptrace is a good idea or a horrible one :) thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html