Re: Indirect syscall restarted incorrectly after signal

David Daney <ddaney.cavm@xxxxxxxxx> · Mon, 12 Jan 2015 16:41:48 -0800

On 01/12/2015 04:31 PM, Ed Swierk wrote:
I'm trying to track down a strange problem affecting an O32 userspace
program on an N64 MIPS kernel. I'm using a 64-bit kernel from Cavium
that's nominally 3.10.20 but has an assortment of patches, including
rt and grsec and various Cavium stuff.

If you have all that stuff, you should also have access to the OCTEON 
simulator, that can produce an instruction trace...

If you can create a testcase that fails within fewer than 10^9 
instructions on the simulator (on say fewer than 4 CPUs), then it would 
be child's play to find the problem...

David Daney

My glibc is stock 2.19-13 from
Debian.

The program is written in Go and compiled with gccgo from gcc 4.9.2.
It is using the exec.Command API which is a Go wrapper for fork(),
exec(), wait() and friends. The runtime library (libgo) was changed
sometime before 4.9.2 to call clone() rather than fork() (see
http://patchwork.ozlabs.org/patch/386411/). Presumably for expediency,
the library invokes clone() indirectly via syscall(). Complicating
matters, the clone() calls are invoked from different threads, so the
program also has to deal with handling SIGCHLD whenever one of its
child processes exits.

Most of the time, the indirect clone() call works just fine.
Occasionally, however, the clone() gets interrupted by a signal. When
the signal handler returns, the kernel tries to restart the clone()
syscall by rolling back the program counter and various registers, and
jumping back into userspace at the point the syscall was first
originally called.

Running my program under strace looks like this (minus noise from
other processes/threads):

2530  syscall(0x1018, 0x12, 0, 0, 0, 0, 0 <unfinished ...>
2532  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3113,
si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
2530  <... syscall resumed> )           = ? ERESTARTNOINTR (To be restarted)
2530  syscall(0x12, 0, 0, 0, 0, 0, 0)   = -1 ENOSYS (Function not implemented)
2532  rt_sigreturn( <unfinished ...>
2532  <... rt_sigreturn resumed> )      = 2

syscall(0x1018) (where 0x1018 is the syscall number for clone on
32-bit MIPS) first returns ERESTARTNOINTR (as expected, this never
actually propagates back to userspace). But the next attempt uses
syscall number 0x22, which returns ENOSYS because there's no such
syscall.

I assume it is no coincidence that 0x12 is the first argument to the
original syscall.

For comparison, when I compile my program against the original libgo
which calls fork() and run it under strace I see the following:

16791 clone( <unfinished ...>
16792 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=17006,
si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
16792 rt_sigreturn( <unfinished ...>
16791 <... clone resumed> child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x3bc843c8) = ? ERESTARTNOINTR (To be restarted)
16792 <... rt_sigreturn resumed> )      = 0
16791 clone( <unfinished ...>
16791 <... clone resumed> child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x3bc843c8) = 17008

Note that the second call to clone() has exactly the same arguments as
the first one, and returns the new PID as expected.

I spent quite some time digging into the syscall code in the kernel
and glibc, but couldn't figure out who is supposed to shift arguments
and push some of them to the stack and others to registers, and so on.

I tried the same experiment with a 64-bit little-endian userspace from
the Debian mips64el repository and a gcc 4.9.2 toolchain targeting
mips64el. The program works fine. So the problem appears limited to
O32 userspace on N64 kernel (not clear whether endianness is an
issue).

I can prepare a self-contained test case, but thought I'd first ask if
this symptom rings a bell with anyone on the list.

--Ed