On Mon, Jan 4, 2010 at 11:27 AM, Helge Deller <deller@xxxxxx> wrote: > I think I have an idea what could have happened and why it most of the times (but not always) crashes in the child process... > > In ports/sysdeps/unix/sysv/linux/hppa/bits/atomic.h we have: > #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ > ({ \ > volatile int lws_errno; \ > volatile int lws_ret; \ > asm volatile( \ > ...some assembly... > "stw %%r28, %0 \n\t" \ > "sub %%r0, %%r21, %%r21 \n\t" \ > "stw %%r21, %1 \n\t" \ > : "=m" (lws_ret), "=m" (lws_errno) \ > : "r" (mem), "r" (oldval), "r" (newval) \ > : _LWS_CLOBBER > > this means, that lws_errno and lws_ret are located on the stack. Correct. We could place them in registers if we wanted, they are registers r28 (lws return) and r21 (lws error). > With gdb I see this expanded to: > 0x40705494 <start_thread+1204>: stw ret0,-1b8(sp) > 0x40705498 <start_thread+1208>: sub r0,r21,r21 > 0x4070549c <start_thread+1212>: stw r21,-1b4(sp) > > So, lws_ret/lws_errno are at -1b8/-1b4(sp). Correct. > And this LWS code is called from > ../nptl/sysdeps/pthread/createthread.c: > static int create_thread (struct pthread *pd, const struct pthread_attr *attr, STACK_VARIABLES_PARMS) > ... > int res = do_clone (pd, attr, clone_flags, start_thread, > STACK_VARIABLES_ARGS, 1); > if (res == 0) > { > ...(line 216): > /* Enqueue the descriptor. */ > do > pd->nextevent = __nptl_last_event; > while (atomic_compare_and_exchange_bool_acq(&__nptl_last_event, pd, pd->nextevent) != 0); > > > And here is what could have happened: > a) do_clone() creates the child process. > b) the child process gets a new stack > c) the child calls atomic_compare_and_exchange_bool_acq() and thus the LWS code above. > d) the LWS code writes to the stack location at -1b8(sp), which is out of bounds for the child process (the child stack got only ~ 0x40 bytes initial room) This is wrong. Each thread should have 8MB of stack. If we only get ~ 0x40 bytes then npt/nptl-init.c is setting __default_stacksize incorrectly. Even PTHREAD_STACK_MIN should be 16kb? Could you verify that your assertion that only ~ 0x40 bytes of initial room were allocated? > e) Thus the child either crashes, overwrites memory of the parent or does other things wrong. I agree with your analysis, but the error is that more stack should be allocated. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html