Oleg Nesterov <oleg@xxxxxxxxxx> writes: > Perhaps I am wrong, but I think you underestimate the problems, and it is > not clear to me if we really want this. I worked through quite a bit of it and I realized a few fundamental issues. The task struct must remain visible until it is reaped and we use siglock to protect in unhash process to protect that reaping. Further tsk->sighand == NULL winds up being a flag used to tell if release_task has been called. To get an usable count on sighand struct all that needed to be done was to change the reference counting of sighand_struct to count processes and not threads. Which is what I wound up posting. > Anyway, Eric, even if we can and want to do this, why we can't do this on > top of my fix? Because your reduction in scope of cred_guard_mutex is fundamentally broken and unnecessary. > I simply fail to understand why you dislike it that much. Yes it is not > pretty, I said this many times, but it is safe in that it doesn't really > change the current behaviour. No it is not safe. And it promotes wrong thinking which is even more dangerous. I reviewed the code and cred_guard_mutex needs to cover what it covers. > I am much more worried about 2/2 you didn't argue with, this patch _can_ > break something and this is obviously not good even if PTRACE_EVENT_EXIT > was always broken. I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually know what the implications of changing it are. Let's see... gdb - no upstart - no lldb - yes strace - no It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change to see if anything breaks. I think we can get away with changing the exec case but it does look worth testing. I hadn't realized you hadn't looked to see what was using PTRACE_O_TRACEEXIT to see if any part of userspace cares. Hmm. This is interesting. From the strace documentation: > Tracer cannot assume that ptrace-stopped tracee exists. There are many > scenarios when tracee may die while stopped (such as SIGKILL). > Therefore, tracer must always be prepared to handle ESRCH error on any > ptrace operation. Unfortunately, the same error is returned if tracee > exists but is not ptrace-stopped (for commands which require stopped > tracee), or if it is not traced by process which issued ptrace call. > Tracer needs to keep track of stopped/running state, and interpret > ESRCH as "tracee died unexpectedly" only if it knows that tracee has > been observed to enter ptrace-stop. Note that there is no guarantee > that waitpid(WNOHANG) will reliably report tracee's death status if > ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead. > IOW: tracee may be "not yet fully dead" but already refusing ptrace > ops. If delivering a second SIGKILL to a ptraced stopped processes will make it continue we have a very interesting out.. When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED) Delivery of a SIGKILL to that task has queue SIGKILL and call signal_wake_up_state(t, TASK_WAKEKILL). Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL) Which wakes up the process. So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT before the tracers find it. Therefore we are only talking a quality of implementation issue if we actually stop and wait for the tracer or not. .... Which brings us to your PTRACE_EVENT_EXIT patch. I think may_ptrace_stop is tested in the wrong place, and is probably buggy. - We should send the signal in all cases except when the ptracing parent does not exist aka (!current->ptrace). The siginfo contains enough information to understand what happened if anyone is listening. - Then we should send the group stop. - Then if we don't want to wait we should: __set_current_state(TASK_RUNNING) - Then we should drop the locks and only call freezable_schedule if we want to wait. That way userspace thinks someone else just sent a SIGKILL and killed the thread before it had a chance to look (which is effectively what we are doing). That sounds idea for both core-dumps and this case. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html