Re: [PATCH] man ptrace: add extended description of various ptrace quirks

Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx> · Mon, 27 Feb 2012 01:58:09 +0100

On Sunday 26 February 2012 19:42, Michael Kerrisk wrote:
> Hello Denys,
> 
> Below is another iteration of the ptrace.2 page with your new
> material. Could you please take a look at the page in general, and the
> FIXMEs in particular? (I'd like to get specific input from you on all
> of the FIXMEs, if possible.)
> 
> Thanks,
> 
> Michael

...
...

> As for
> .BR PTRACE_PEEKUSER ,
> the offset must typically be word-aligned.
> In order to maintain the integrity of the kernel,
> some modifications to the USER area are disallowed.
> .\" FIXME In the preceding sentence, which modifications are disallowed,
> .\" and when they are disallowed, how does userspace discover that fact?
...
> As for
> .BR PTRACE_POKEUSER ,
> some general purpose register modifications may be disallowed.
> .\" FIXME In the preceding sentence, which modifications are disallowed,
> .\" and when they are disallowed, how does userspace discover that fact?

I don't know the answer to this question.

> Use of the
> .B WNOHANG
> flag may cause
> .BR waitpid (2)
> to return 0 ("no wait results available yet")
> even if the tracer knows there should be a notification.
> Example:
> .nf
> 
>     kill(tracee, SIGKILL);
>     waitpid(tracee, &status, __WALL | WNOHANG);
> .fi
> .\" FIXME: mtk: the following comment seems to be unresolved?
> .\"        Do you want to add anything?
> .\"
> .\"     waitid usage? WNOWAIT?
> .\"     describe how wait notifications queue (or not queue)

I did not experiment with waitid and WNOWAIT flag yet.

> .LP
> The following kinds of ptrace-stops exist: signal-delivery-stops,
> group-stop, PTRACE_EVENT stops, syscall-stops
> .\"
> .\" FIXME: mtk: the following text ("[, PTRACE_SINGLESTEP...") is incomplete.
> .\"        Do you want to add anything?
> .\"
> [, PTRACE_SINGLESTEP, PTRACE_SYSEMU,
> PTRACE_SYSEMU_SINGLESTEP].

I am not familiar enough with these ptrace commands, can't add anything useful.
You can just remove the [...] part for now.

> As of kernel 2.6.38,
> after the tracer sees the tracee ptrace-stop and until it
> restarts or kills it, the tracee will not run,
> and will not send notifications (except
> .B SIGKILL
> death) to the tracer, even if the tracer enters into another
> .BR waitpid (2)
> call.
> .LP
> .\" FIXME It is unclear what "this kernel behavior" refers to.
> .\" Can show me exactly which piece of text above or below is
> .\" referred to when you say "this kernel behavior"?
> Currently, this kernel behavior
> causes a problem with transparent handling of stopping signals:
> if the tracer restarts the tracee after group-stop,
> the stopping signal
> is effectively ignored\(emthe tracee doesn't remain stopped, it runs.
> If the tracer doesn't restart the tracee before entering into the next
> .BR waitpid (2),
> future
> .B SIGCONT
> signals will not be reported to the tracer.
> This would cause
> .B SIGCONT
> to have no effect.

You seem to be asking this question repeatedly. I tried to give you
the answer several times. I don't know what is unclear here.

Ok, I will try to explain it yet again.

Let's say a tracee receives stopping signal and stops.
Tracer sees this stop via waitpid() status.
It determines that it is a group-stop.

After this, tracer has two options: (2) execute ptrace(PTRACE_CONT)
on the tracee before going back to waitpid'ing, or (2) don't
do ptrace(PTRACE_CONT), and go back to waitpid'ing.

Both options are bad: in option (1), tracee will start running -
in effect, making stop signal to not have intended effect.
In option (2), tracee will be stopped FOREVER - SIGCONT won't be able
to start it again.

> Currently, this kernel behavior
> causes a problem with transparent handling of stopping signals:
> if the tracer restarts the tracee after group-stop,
> the stopping signal
> is effectively ignored

I am not a native English speaker. Please rephrase
this text fragment so that it sounds understandable to you.
I would agree to any version of it by now.

> But such detection is fragile and is best avoided.
> .LP
> Using the
> .B PTRACE_O_TRACESYSGOOD
> .\"
> .\" FIXME Below: "is the recommended method" for WHAT?
> option is the recommended method,
> since it is reliable and does not incur a performance penalty.

It is the recommended method to distinquish syscall-stops
from other kinds of ptrace-stops.

> If after syscall-enter-stop,
> the tracer uses a restarting command other than
> .BR PTRACE_SYSCALL ,
> syscall-exit-stop is not generated.
> .LP
> .B PTRACE_GETSIGINFO
> on syscall-stops returns
> .B SIGTRAP
> in
> .IR si_signo ,
> with
> .I si_code
> set to
> .B SIGTRAP
> or
> .IR (SIGTRAP|0x80) .
> .SS PTRACE_SINGLESTEP, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP stops
> .\"
> .\" FIXME The following TODO is unresolved
> .\"       Do you want to add anything, or (less good) do we just
> .\"       convert this into a comment in the source indicating
> .\"       that these points still need to be documented?
> .\"
> (TODO: document stops occurring with PTRACE_SINGLESTEP, PTRACE_SYSEMU,
> PTRACE_SYSEMU_SINGLESTEP)

I am not familiar enough with these ptrace commands, can't add anything useful.
You can just remove the (...) part for now.

> The design bug here is that a ptrace attach and a concurrently delivered
> .B SIGSTOP
> may race and the concurrent
> .B SIGSTOP
> may be lost.
> .\"
> .\" FIXME: mtk: the following comment seems to be unresolved?
> .\"	   Do you want to add any text?
> .\"
> .\"      Describe how to attach to a thread which is already group-stopped.

No, I don't have anything useful to add here right now.

> Another complication is that the tracee may enter other ptrace-stops
> and needs to be restarted and waited for again, until
> .B SIGSTOP
> is seen.
> Yet another complication is to be sure that
> the tracee is not already ptrace-stopped,
> because no signal delivery happens while it is\(emnot even
> .BR SIGSTOP .
> .\" FIXME: mtk: the following comment seems to be unresolved?
> .\"       Do you want to add anything?
> .\"
> .\"     Describe how to detach from a group-stopped tracee so that it
> .\"     doesn't run, but continues to wait for SIGCONT.

No, I don't have anything useful to add here right now.

> If the tracer dies, all tracees are automatically detached and restarted,
> unless they were in group-stop.
> Handling of restart from group-stop is
> .\" FIXME: Define currently
> currently buggy, but the
> .\" FIXME: Planned for when? And should applications be designed
> .\" in some way so as to allow for this future change?
> "as planned" behavior is to leave tracee stopped and waiting for
> .BR SIGCONT .

It means that current kernels are known to have bugs in this area:
if tracer exits, group-stopped tracees may start running.

> Then a
> .B PTRACE_EVENT_EXEC
> stop happens, if the
> .BR PTRACE_O_TRACEEXEC
> option was turned on.
> .\" FIXME: mtk: the following comment seems to be unresolved?
> .\"       (on which tracee - leader? execve-ing one?)

At this point, pid change has already occurred.
Currently, rendered manpage looks like this:

*  All   other   threads   stop   in  PTRACE_EVENT_EXIT  stop,  if  the
   PTRACE_O_TRACEEXIT option was turned on.   Then  all  other  threads
   except  the  thread  group leader report death as if they exited via
   _exit(2) with exit code 0.  Then a PTRACE_EVENT_EXEC  stop  happens,
   if the PTRACE_O_TRACEEXEC option was turned on.

*  The  execing  tracee  changes  its  thread  ID  while  it  is in the
   execve(2).  (Remember, under ptrace, the "pid" returned  from  wait-
   pid(2),  or fed into ptrace calls, is the tracee's thread ID.)  That
   is, the tracee's thread ID is reset to be the same  as  its  process
   ID, which is the same as the thread group leader's thread ID.

*  If  the  thread group leader has reported its death by this time...

I suggest creating a new bullet point after the second one,
and moving "Then a PTRACE_EVENT_EXEC stop happens, if the
PTRACE_O_TRACEEXEC option was turned on" text into it.

This will clearly indicate that by this time, pid has changed.

There is a bit of text below:

> The thread ID change happens before
> .B PTRACE_EVENT_EXEC
> stop, not after.

which will be made redundant by the above change and can be deleted.

> .\" FIXME: Please check: at various places in the following,
> .\"        I have changed "pid" to "[the tracee's] thead ID"
> .\"        Is that okay?
> .IP *
> The execing tracee changes its thread ID while it is in the
> .BR execve (2).
> (Remember, under ptrace, the "pid" returned from
> .BR waitpid (2),
> or fed into ptrace calls, is the tracee's thread ID.)
> That is, the tracee's thread ID is reset to be the same as its process ID,
> which is the same as the thread group leader's thread ID.

Yes, the text look ok to me.

> The
> .B PTRACE_O_TRACEEXEC
> option is the recommended tool for dealing with this situation.
> It enables
> .B PTRACE_EVENT_EXEC
> stop, which occurs before
> .BR execve (2)
> returns.
> .\" FIXME Following on from the previous sentences,
> .\"       can/should we add a few more words on how
> .\"       PTRACE_EVENT_EXEC stop helps us deal with this situation?
> .LP

I propose the following text:

The PTRACE_O_TRACEEXEC option is the recommended tool for dealing with
this situation. First, it enables PTRACE_EVENT_EXEC stop, which occurs
before execve(2) returns. In this stop, tracer can use
ptrace(PTRACE_GETEVENTMSG) call to retrieve the tracee's former thread ID.
(This feature was introduced in Linux 3.0).
Second, PTRACE_O_TRACEEXEC option disables legacy SIGTRAP generation
on execve.

> As of Linux 2.6.38, the following is believed to work correctly:
> .IP * 3
> exit/death by signal is reported first to the tracer, then,
> when the tracer consumes the
> .BR waitpid (2)
> result, to the real parent (to the real parent only when the
> whole multithreaded process exits).
> .\"
> .\" FIXME mtk: Please check: In the next line,
> .\" I changed "they" to "the tracer and the real parent". Okay?
> If the tracer and the real parent are the same process,
> the report is sent only once.

Yes, this change is ok.

> .B EPERM
> The specified process cannot be traced.
> This could be because the
> tracer has insufficient privileges (the required capability is
> .BR CAP_SYS_PTRACE );
> unprivileged processes cannot trace processes that they
> cannot send signals to or those running
> set-user-ID/set-group-ID programs, for obvious reasons.
> .\"
> .\" FIXME I reworked the discussion of init below to note
> .\" the kernel version (2.6.26) when the behavior changed for
> .\" tracing init(8). Okay?
> Alternatively, the process may already be being traced,
> or (on kernels before 2.6.26) be
> .BR init (8)
> (PID 1).

Yes, this change is ok.

> glibc currently declares
> .BR ptrace ()
> as a variadic function with only the
> .I request
> argument fixed.
> This means that unneeded trailing arguments may be omitted,
> though doing so makes use of undocumented
> .BR gcc (1)
> behavior.
> .\" FIXME Please review. I reinstated the following, noting the
> .\" kernel version number where it ceased to be true
> .LP
> In Linux kernels before 2.6.26,
> .\" See commit 00cd5c37afd5f431ac186dd131705048c0a11fdb
> .BR init (8),
> the process with PID 1, may not be traced.

Yes, this change is ok.

> .\" FIXME So, can we just remove the following text (rather than
> .\" just commenting it out)?
> .\"
> .\" Covered in more details above: (removed by dv)
> .\" .LP
> .\" Tracing causes a few subtle differences in the semantics of
> .\" traced processes.
> .\" For example, if a process is attached to with
> .\" .BR PTRACE_ATTACH ,
> .\" its original parent can no longer receive notification via
> .\" .BR waitpid (2)
> .\" when it stops, and there is no way for the new parent to
> .\" effectively simulate this notification.
> .\" .LP
> .\" When the parent receives an event with
> .\" .B PTRACE_EVENT_*
> .\" set,
> .\" the tracee is not in the normal signal delivery path.
> .\" This means the parent cannot do
> .\" .BR ptrace (PTRACE_CONT)
> .\" with a signal or
> .\" .BR ptrace (PTRACE_KILL).
> .\" .BR kill (2)
> .\" with a
> .\" .B SIGKILL
> .\" signal can be used instead to kill the tracee
> .\" after receiving one of these messages.
> .\" .LP

Yes, let's remove this comment.

> If a thread group leader is traced and exits by calling
> .BR _exit (2),
> .\" Note from Denys Vlasenko:
> .\"     Here "exits" means any kind of death - _exit, exit_group,
> .\"     signal death. Signal death and exit_group cases are trivial,
> .\"     though: since signal death and exit_group kill all other threads
> .\"     too, "until all other threads exit" thing happens rather soon
> .\"     in these cases. Therefore, only _exit presents observably
> .\"     puzzling behavior to ptrace users: thread leader _exit's,
> .\"     but WIFEXITED isn't reported! We are trying to explain here
> .\"     why it is so.
> a
> .B PTRACE_EVENT_EXIT
> stop will happen for it (if requested), but the subsequent
> .B WIFEXITED
> notification will not be delivered until all other threads exit.
> As explained above, if one of other threads calls
> .BR execve (2),
> the death of the thread group leader will
> .I never
> be reported.
> If the execed thread is not traced by this tracer,
> the tracer will never know that
> .BR execve (2)
> happened.
> One possible workaround is to
> .B PTRACE_DETACH
> the thread group leader instead of restarting it in this case.
> Last confirmed on 2.6.38.6.
> .\"        ^^^ need to test/verify this scenario
> .\" FIXME: mtk: the preceding comment seems to be unresolved?
> .\"        Do you want to add anything?

No, I don't have anything useful to add here right now.

-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html