Kyle, James, I have constructed a vfork test case which shows some of the problems I have using vfork reliably. This fails every time on my PA8700 system running 2.6.32-rc6. It appears as though r28 (ret0) in the parent is being corrupted. The intent of the testcase is to do the following: (a) vfork (b) Launch "ls -l" in the vfork'd child. (c) Print some information in the parent. ~~~ vfork.c #include <stdio.h> #include <stdlib.h> #include <errno.h> int main (void) { pid_t child; char *cmd[] = { "ls", "-l", (char *)0 }; char *env[] = { "HOME=/tmp", (char *)0 }; child = vfork(); if (child == 0) { execve("/bin/ls", cmd, env); } else { printf("child != 0\n"); } printf("child is 0x%x\n", (unsigned int)child); return 0; } ~~~ Compile this test case twice: gcc -O1 -g -o vfork-O1 vfork.c gcc -O0 -g -o vfork-O0 vfork.c When run on x86, the following are the results: ~~~ vfork-O1 child != 0 child is 0x34e3 total 4824848 -rw-r--r-- 1 carlos carlos 25135 Sep 2 10:41 "${BuildArtifactFileName}.map" ... [snip] ~~~ ~~~ vfork-O0 child != 0 child is 0x3515 total 4824880 -rw-r--r-- 1 carlos carlos 25135 Sep 2 10:41 "${BuildArtifactFileName}.map" ~~~ This x86 runs are correct. However, on hppa compiling with gcc version 4.3.4 (Debian 4.3.4-6) I get the following: ~~~ vfork-O1 ./vfork-O1 child is 0x40552aac carlos@firin:~/fsrc$ total 284 drwxr-xr-x 8 carlos carlos 4096 Jul 14 2005 binutils-old-work ... [snip] ~~~ The return from vfork is corrupted in the parent. This gets worse at -O0. ~~~ vfork-O0 child is 0x405551a0 Illegal instruction carlos@firin:~/fsrc$ total 284 drwxr-xr-x 8 carlos carlos 4096 Jul 14 2005 binutils-old-work ... [snip file list] ~~~ The kernel says: vfork-O0 (pid 16313): Illegal instruction (code 8) at 000000004054ec77 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00000000000001101111111100001111 Not tainted r00-03 000000ff0006ff0f 00000000c06be968 000000004054ec70 00000000405551a4 r04-07 0000000040552aac 0000000000011b8a 0000000040553aac 00000000000f8c48 r08-11 00000000000f8a48 0000000000000000 00000000000c0000 00000000000f5808 r12-15 00000000000e8404 00000000000e8408 00000000ffffffff 0000000000000000 r16-19 0000000000000000 00000000000e6c40 00000000000e3fbc 00000000401f07c0 r20-23 0000000000000000 0000000040001900 00000000401e65f4 0000000000000000 r24-27 fffffffffffffff5 0000000000000000 00000000c06be42c 0000000000011b50 r28-31 0000000000000000 0000000000000000 0000000040552aac 000000004043d053 sr00-03 00000000053b5800 0000000000000000 0000000000000000 00000000053b5800 sr04-07 00000000053b5800 00000000053b5800 00000000053b5800 00000000053b5800 VZOUICununcqcqcqcqcqcrmunTDVZOUI FPSR: 00000000000000000000000000000000 FPER1: 00000000 fr00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 fr04-07 0000000040197fac 00000000405628cc 000000004060b9a0 4040000000000000 fr08-11 00000000401fd7ac 0000000000000000 000000009f81ebb8 000000009f8403e0 fr12-15 0000000000000002 000000009f81ebb0 000000009f81ebc0 000000004060b9a0 fr16-19 0000000000000002 000000009f80eac0 0000000040552b48 00000000f000022c fr20-23 0000000040657140 00000000404e32f8 0000000000000700 00001c8c00000000 fr24-27 0000000000000000 000000004011c8ac 0000000040652ca0 fffffffffffff000 fr28-31 0000000000000000 ffffffffffffff9c 00000000401d4a68 0000000000000000 IASQ: 00000000053b5800 00000000053b5800 IAOQ: 000000004054ec77 000000004054ec7b IIR: 0015e5b6 ISR: 0000000000000000 IOR: 0000000000000000 CPU: 0 CR30: 00000000458a8000 CR31: ffffffffffffffff ORIG_R28: 0000000000000000 IAOQ[0]: 000000004054ec77 IAOQ[1]: 000000004054ec7b RP(r2): 000000004054ec70 ~~~ However, when running the -O1 version under ptrace, it works: ~~~ strace -o strace.log -ff ./vfork-O1 child != 0 child is 0x40b7 total 340 drwxr-xr-x 8 carlos carlos 4096 Jul 14 2005 binutils-old-work ... [snip] ~~~ This indicates to me that the kernel is corrupting the parent process after vfork, and the testcase shows it. I have reviewed the assembly generated by the compiler and I can't show that anything is wrong. To remove the C library from the loop I attach a complete vfork implementation as used by glibc. You can compile the test case using: gcc -O1 -g -o vfork-O1 vfork.c pt-vfork.s This will call the vfork in pt-vfork.s and allow you to adjust the instructions stream up to and including removing the vfork system call e.g. ble 0x100(%sr2,%r0). For example I use "iitlbp %r0,(%sr0, %r0)" to force a fault in either the parent or the child. Please note that this pt-vfork.s isn't exactly as used in glibc, I have expanded the return sequence for both child and parent so you can cause a fault in one the other or both independently. In summary: * Test case works on x86. * Test case fails on hppa. * Test case works on hppa under strace. What are we doing wrong and where is the bug? Cheers, Carlos.
Attachment:
pt-vfork.s
Description: Binary data