> I can't get minifail2 to fail, how are you choosing which stack > address to watch for corruption? The stack allocated by pthread_create is always placed at 0x40000000 in my runs. The location that the last version which I sent monitors is 0x4000001c. This is the location where thread_run saves %r26 shortly after entry. I'm compiling without optimization as indicated in the comment. Usually, I also link with -static since gdb is somewhat broken when debugging shared libraries. If the compile is optimized, then thread_run may not save much on the stack and it may be harder to get it to fail. The problem is timing dependent. I'm a bit surprised that you can't duplicate the problem as Helge could. When I run the test loop, it fails about one in thirty on my c3750. I have a 250 Hz tick rate in my config. The fail rate is somewhat dependent on system load. I added the monitor loops to try and pin down when the corruption occurs but the test should fail if the thread's stack region gets corrupted while thread_run executes. Have you made any changes to libc which would affect the synchonization of parent and child? I think the problem might be fixed if the parent and thread were prevented from executing until the fork is complete. I still think the kernel is interchanging the pages used by the parent and child for the mmap'd stack region. > Are you looking at the child's new stack created by mmap? We are looking at the area allocated by mmap in the call to pthread_create. This is always where the corruption occurs in the subsequent fork call. This area is used as the stack for the thread created by pthread_create. When the problem occurs, it's the thread that generates the fault, never the original parent. > Are you looking at the parent's stack? No. However, the parent can see changes to the stack area used by the thread created by the pthread_create call. As I understand the situtation, the child of the fork call should see a snapshot of this area as it was at the time of the fork call. fork is nominally supposed to be "atomic" (see Open Group man page). In both the parent and forked child, the first three words starting at 0x40000000 are always nonzero and correct. That's why I think the pages are getting interchanged (I also played with adding extra cache flushes in pacache.S but that didn't change things). Dave -- J. David Anglin dave.anglin@xxxxxxxxxxxxxx National Research Council of Canada (613) 990-0752 (FAX: 952-6602) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html