ptrace.2: BUGS (missing WIFEXITED notification)

Vegard Nossum <vegard.nossum@xxxxxxxxxx> · Tue, 12 May 2015 16:31:08 +0200

[resend with Cc: linux-man]

Hi again :-)

We hit another edge case in the ptrace() interface and after several
hours of chasing it down, we found that it was already described in the
"BUGS" section:

"If a thread group leader is traced and exits by calling _exit(2), a
PTRACE_EVENT_EXIT stop will happen for it (if requested), but the
subsequent WIFEXITED notification will not be delivered until all other
threads exit. As explained above, if one of other threads calls
execve(2), the death of the thread group leader will never be reported.
If the execed thread is not traced by this tracer, the tracer will never
know that execve(2) happened. One possible workaround is to
PTRACE_DETACH the thread group leader instead of restarting it in this
case. Last confirmed on 2.6.38.6."

I wanted to write that we've also noticed the same thing not only for
_exit() but also for terminating signals, however we also came across
this bit in the manual source:

.\" Note from Denys Vlasenko:
.\" Here "exits" means any kind of death - _exit, exit_group,
.\" signal death. Signal death and exit_group cases are trivial,
.\" though: since signal death and exit_group kill all other threads
.\" too, "until all other threads exit" thing happens rather soon
.\" in these cases. Therefore, only _exit presents observably
.\" puzzling behavior to ptrace users: thread leader _exit's,
.\" but WIFEXITED isn't reported! We are trying to explain here
.\" why it is so.

There is a difference, however -- this behaviour can also be observed
for the other types of death if you are currently tracing the other
threads too!

In other words, when multiple threads are being traced and the group
leader exits, waitpid() on this group leader will hang indefinitely
(because the other threads won't exit until we wait for and CONT/DETACH
them, and we don't receive the exit notification for the group leader
until the other threads have really exited).

To me, this means that not only _exit() but also other types of death
present "observably puzzling behavior to ptrace users".

I'd propose the following changes:

1) include some (if not all) of Denys's explanation in the actual text:

-If a thread group leader is traced and exits by calling _exit(2)...
+If a thread group leader is traced and exits for any reason (_exit,
exit_group, signal death, etc.), ...

2) include the bits about tracing other threads:

+If the other threads in the thread group are being traced, they will
not exit until they have been either waited for and restarted or
detached, thereby blocking the exit notification (WIFEXITED) of the
group leader to wait()/waitpid().

3) there's a typo in the original text:

-one of other threads
+one of the other threads

Feel free to rephrase any of the above.

Thoughts? We can also provide more details, including a reproducer, or
clarification if needed.

(PS: Please also credit Quentin Casasnovas with the report as we've both
spent more than a few hours tracking this down!)

Vegard
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html