engine interaction, callback order

Roland McGrath <roland@xxxxxxxxxx> · Wed, 29 Aug 2007 05:15:10 -0700 (PDT)

Renzo Davoli brought up the issue of the order of callbacks to different
engines for the same utrace event.  I know someone working on uprobes
brought this up before (probably on the systemtap mailing list some time
ago).  

In both cases, the purpose behind the practical interest in callback order
is the same.  Both syscall interception and breakpoint assistance do
emulation or "warping" kinds of things.  In the breakpoint case, it's
useful to execute copied instructions, so the true PC (and other register
values) to be executed in user mode can be changed from the unmolested user
program.  In the syscall case, user register values are tweaked to affect
whether and what actual syscalls can execute.  

These engines temporarily change the user state that executes.  So, you
want them to do their work last, so that other observing engines see the
unmolested user state before they tweak it.  But, you want them to do their
work first, so they restore the user state to the proper values to be
observed before other engines look.

A related issue is an engine wanting the last word on a signal's
disposition.  You may want to be last in report_signal and force a
particular disposition (like terminate) no matter what other engines had
decided about the signal.  Conversely, an engine filling the core dump
function (or starting the bug-reporter robot or whatever) wants to be the
last resort, taking over only if any other engines (active debuggers, etc.)
want to let a fatal signal proceed.  

Whether you want your register values in place only at the last moment (to
be run but not seen), or want your signal disposition decision to go in
effect only at the last moment, the problem is the same.  No order of
callbacks alone helps you.  Another engine can leave QUIESCE set when
you've finished all your callbacks and cleared QUIESCE.  Then your warped
register values are there to be seen by any engine doing asynchronous
regset access while it has QUIESCE set.  Or, the signal disposition to take
effect can be changed by utrace_inject_signal (or perhaps can't, in the
current implementation).  You don't have any way to react to what other
engines are doing.

utrace_inject_signal can't do any queuing--it's injecting the particular
action (possibly fatal) to be taken the very next thing, not a signal to be
considered and dispatched later.  But then what are you supposed to do when
another engine injected first and you get -EBUSY?  There isn't an
opportunity to get your word in there, to examine or replace the competing
disposition scheduled.  

These issues lead to the idea of changing how report_quiesce and
report_signal work.  report_signal tells you "without intervention, user
mode will resume from right here and do this".  Currently report_quiesce
tells you that user mode might resume now if permitted, but also tells you
at some various places just that user mode is not running right now and
it's safe to look.  (At those latter, it's not about to get back to user
mode (or terminate) without passing through some more event points, though
it may be working and blocking nonquiescently before then.)  So perhaps
rename report_signal to report_resume, and call it when dequeuing a signal
and when dequeuing none and preparing to return to user mode after having
stopped for QUIESCE.  That is, it's called when there is the opportunity to
decide the disposition of resuming (fatal or signal handler or normal).
Then do away with utrace_inject_signal entirely.  The EVENT(SIGNAL_*) bits
say an engine wants report_resume for those dispositions, EVENT(QUIESCE)
says the engine wants it no matter what.  An engine uses QUIESCE to get the
thread to call report_resume.  Then every interested engine gets the
callback, and can see the disposition choice left by the last engine, and
whether it's been left in QUIESCE.  

With report_resume as the center of "final user status" cooperation, then
we can think about some sort of callback priority order covering the needs.

For the syscall emulation warping, I think it may be a happy special case
that can sidestep the whole complex engine interaction subject.  It may
make sense to have a first-class utrace feature via return values from
report_syscall callbacks, for skipping a syscall and for restoring register
values to their values before kernel entry.  Rather than an engine using
tracehook_abort_syscall et al, so that other engines can see its changes to
the user registers or pseudo-register, the utrace core would do it after
all callbacks and places engines could stop and look.  

I'd appreciate any feedback anyone has in any of these areas.

Thanks,
Roland