On 30.08.2016 01:55, Andy Lutomirski wrote:
On Aug 29, 2016 11:30 AM, "Marcin Nowakowski"
<marcin.nowakowski@xxxxxxxxxx> wrote:
Syscall metadata makes an assumption that only a single syscall number
corresponds to a given method. This is true for most archs, but
can break tracing otherwise.
For MIPS platforms, depending on the choice of supported ABIs, up to 3
system call numbers can correspond to the same call - depending on which
ABI the userspace app uses.
MIPS isn't special here. x86 does the same thing. Why isn't this a
problem on x86?
Hi Andy,
My understanding is that MIPS is quite different to what most other
architectures do ...
First of all x86 disables tracing of compat syscalls as that didn't work
properly because of wrong mapping of syscall numbers to syscalls:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f431b634f
Moreover, when trace_syscalls is initialised, the syscall metadata is
updated to include the right syscall numbers. That uses
arch_syscall_addr method, which has a default implementation in
kernel/trace/trace_syscalls.c:
unsigned long __init __weak arch_syscall_addr(int nr)
{
return (unsigned long)sys_call_table[nr];
}
that works for x86 and only uses 'native' syscalls, ie. for x86_64 will
not map any of the ia32_sys_call_table entries. So on one hand we have
the code that disables tracing for x86_64 compat, on the other we only
ensure that the native calls are mapped.
It is quite different for MIPS where syscall numbers for different ABIs
have distinct call numbers, so the following code maps the syscalls
(for O32 -> 4xxx, N64 -> 5xxx, N32 -> 6xxx):
unsigned long __init arch_syscall_addr(int nr)
{
if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux +
__NR_N32_Linux_syscalls)
return (unsigned long)sysn32_call_table[nr -
__NR_N32_Linux];
if (nr >= __NR_64_Linux && nr <= __NR_64_Linux +
__NR_64_Linux_syscalls)
return (unsigned long)sys_call_table[nr - __NR_64_Linux];
if (nr >= __NR_O32_Linux && nr <= __NR_O32_Linux +
__NR_O32_Linux_syscalls)
return (unsigned long)sys32_call_table[nr -
__NR_O32_Linux];
return (unsigned long) &sys_ni_syscall;
}
As a result when init_ftrace_syscalls() loops through all the possible
syscall numbers, it first finds an O32 implementation, then N64 and
finally N32. As the current code doesn't expect multiple references to a
given syscall number, it always overrides the metadata with the last
found - as a result only N32 syscalls are mapped.
This is generally unexpected and wrong behaviour, and to makes things
worse - since when N32 support is enabled, it overwrites N64 entries, it
becomes impossible to trace native syscalls.
> Also, you seem to be partially reinventing AUDIT_ARCH here. Can you
> use that and integrate with syscall_get_arch()?
Please correct me if I don't understand what you meant here, but I don't
see how these can be integrated ...
For MIPS syscall_get_arch() properly determines arch type and calling
convention, but that information is not enough to determine what call
was made and how to map it to syscall metadata from another calling
convention.
Marcin