Matt Helsley <matthltc@xxxxxxxxxx> writes: > On Mon, Apr 05, 2010 at 08:44:43PM -0700, Roland McGrath wrote: >> (I've been away for a couple of weeks.) >> I concur with the things Oleg's said in this thread. >> >> As to what's "correct" for the kernel in theory, it would certainly make >> sense to clean up the ptrace cases to use the tracer (parent) pid_ns when >> reporting any PID as such. The wait and SIGCHLD code already does this, so >> that would be consistent. Off hand I don't see anything other than >> tracehook_report_clone{,_complete}() that sees the wrong value now. > > Yup. > >> Fixing that requires a bit of hair. The simple and clean approach is to >> just have the tracehook calls (i.e. ptrace layer) extract the PID from the >> task_struct using the desired pid_ns. The trouble there is that the > > It's also possible to take an extra reference to the struct pid and pass > that to the tracehook. That and the pid_ns of the tracer receiving the pid > is all we'll ever need inside the tracehook layer. The only advantage, I > think, is we wouldn't pin the task struct while holding the pid reference. > >> tracehook_report_clone_complete() call is made when that task_struct is no >> longer guaranteed to be valid. The contrary approach of extracting the >> appropriate value for the tracer earlier breaks the clean layering because >> the fork.c code really should not know at all that ->parent->nsproxy is the >> place to look for what values to pass to tracehook calls. I guess the >> simple and clean fix is to get_task_struct() before wake_up_new_task() >> and put_task_struct() after tracehook_report_clone_complete(). That does >> add some gratuitous atomic incr/decr overhead, though. > > Also true. > > Of course my suggestion of holding the pid reference won't avoid adding > atomic ops -- just changes which refcount they work on. > >> >> None of this has much of anything to do with strace, of course. As I've >> said, I don't see anything other than the PTRACE_GETEVENTMSG value for >> PTRACE_EVENT_{CLONE,FORK,VFORK} reports that is wrong in the kernel. As >> Oleg said, strace doesn't use that at all. (This is not the place to >> discuss the details of strace further.) > > Also, looking at proposed changes (utrace and Eric Biederman's setns()) > storing a pid nr rather than a reference to a task struct or struct pid > probably won't be correct. My setns work has demonstrated that even for entering a namespace we never ever need to change the struct pid of a task. setns has no other bearing on this problem then to say there is no foreseeable reason to change the rules. > In the case of Eric Biederman's setns(), if capable of changing pid namespace, > we could have: > > Traced Tracer > fork() > ... (an arbitrary amount of time passes) > setns() > ptrace(GETEVENTMSG) Forget that. The pid namespace was architected so that we can ptrace a process in another pid namespace. > At which point returning a static pid number held in the message field > produces the wrong pid. No. A processes always sees pids from the context of it's original pid namespace. All setns does is affect which pid namespace children will be native in. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers