On 03/29/2011 03:57 PM, Gergely Kis wrote:
[...]
That said, how do you handle the case of getting a fault while reading
code/data while unwinding?
In case there is a fault, we basically return to the caller, so the
building of the callgraph is stopped. We looked at the ARM version and
they handled this case in a similar way.
But it looks like you are invoking get_user() from interrupt context?
As far as I know that is not allowed. Have you tested it?
I don't see where you handle faults when trying to read kernel memory.
Also I don't see how you handle these cases:
o Leaf functions where neither the $ra is saved to the stack, nor the stack
pointer adjusted.
We currently don't have a special handling for this, but we plan to
try to detect the prologue of leaf functions as well, if possible.
This detection process will probably never be
100% accurate, but we have found the call graph outputs even in their
current form useful.
Oprofile call graphs are not always accurate anyways, because of the
statistical nature of oprofile.
o Functions where $sp is adjusted several times (use of alloca() or VLAs).
o Functions with multiple return points (i.e. 'jr $ra' in the middle of a
function).
Yes, this is a shortcoming in the current implementation, we are
already working on changing the prologue detection to detect the exact
combination of the prologue instructions.
We are also looking at the stack unwinding function used to display
the kernel stacktraces when an oops or other error condition occurs,
to see if we can refactor it to suit our needs as well. This way a
single solution for stack walking could be included in the kernel.
o Functions with more than 1024 instructions.
Currently we set this (arbitrary) limit. We can probably change it, or
make it configurable, but until we are using heuristics to detect the
function boundaries, I think we should have a maximum number of
allowed steps for the stack walking functions.
What do you think?
My questions about your unwinding algorithm were really rhetorical in
nature.
It is not possible to do robust unwinding by code examination, precisely
because there is no way to identify the start of a function.
David Daney