> > I tested the patch and the testcase in > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203 > > still segfaults. > > I think the expect/tcl bug and the bug 561203 are related. Looking > at the minifail core dump, I see: > > Core was generated by `./minifail'. > Program terminated with signal 11, Segmentation fault. > #0 0x00000000 in ?? () > > So, how did we get to 0? $rp is 0, so we might have executed a > return to this location. $r31 conains 0x4157cc4f. > > (gdb) disass 0x4157cc3c 0x4157cc5c > Dump of assembler code from 0x4157cc3c to 0x4157cc5c: > 0x4157cc3c <_IO_puts+332>: copy rp,r25 > 0x4157cc40 <_IO_puts+336>: copy r6,r24 > 0x4157cc44 <_IO_puts+340>: be,l b0(sr2,r0),sr0,r31 > 0x4157cc48 <_IO_puts+344>: ldi 0,r20 > 0x4157cc4c <_IO_puts+348>: ldi -b,r24 > 0x4157cc50 <_IO_puts+352>: cmpb,=,n r24,r21,0x4157cc38 <_IO_puts+328> > 0x4157cc54 <_IO_puts+356>: nop > 0x4157cc58 <_IO_puts+360>: ldi -2d,r25 I think I have an idea what could have happened and why it most of the times (but not always) crashes in the child process... In ports/sysdeps/unix/sysv/linux/hppa/bits/atomic.h we have: #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ ({ \ volatile int lws_errno; \ volatile int lws_ret; \ asm volatile( \ ...some assembly... "stw %%r28, %0 \n\t" \ "sub %%r0, %%r21, %%r21 \n\t" \ "stw %%r21, %1 \n\t" \ : "=m" (lws_ret), "=m" (lws_errno) \ : "r" (mem), "r" (oldval), "r" (newval) \ : _LWS_CLOBBER this means, that lws_errno and lws_ret are located on the stack. With gdb I see this expanded to: 0x40705494 <start_thread+1204>: stw ret0,-1b8(sp) 0x40705498 <start_thread+1208>: sub r0,r21,r21 0x4070549c <start_thread+1212>: stw r21,-1b4(sp) So, lws_ret/lws_errno are at -1b8/-1b4(sp). And this LWS code is called from ../nptl/sysdeps/pthread/createthread.c: static int create_thread (struct pthread *pd, const struct pthread_attr *attr, STACK_VARIABLES_PARMS) ... int res = do_clone (pd, attr, clone_flags, start_thread, STACK_VARIABLES_ARGS, 1); if (res == 0) { ...(line 216): /* Enqueue the descriptor. */ do pd->nextevent = __nptl_last_event; while (atomic_compare_and_exchange_bool_acq(&__nptl_last_event, pd, pd->nextevent) != 0); And here is what could have happened: a) do_clone() creates the child process. b) the child process gets a new stack c) the child calls atomic_compare_and_exchange_bool_acq() and thus the LWS code above. d) the LWS code writes to the stack location at -1b8(sp), which is out of bounds for the child process (the child stack got only ~ 0x40 bytes initial room) e) Thus the child either crashes, overwrites memory of the parent or does other things wrong. Additionally: Due to the LWS assembly code and because we don't have many registers free while using LWS, gcc used %rp as a temporary register which may have fooled us in our thinking? 0x40705458 <start_thread+1144>: ldi 0,rp 0x4070545c <start_thread+1148>: ldi fb,r3 0x40705460 <start_thread+1152>: ldw -70(sp),ret0 0x40705464 <start_thread+1156>: ldw 214(ret0),ret1 0x40705468 <start_thread+1160>: copy r5,r26 0x4070546c <start_thread+1164>: copy ret1,r25 0x40705470 <start_thread+1168>: copy rp,r24 0x40705474 <start_thread+1172>: be,l b0(sr2,r0),sr0,r31 0x40705478 <start_thread+1176>: ldi 0,r20 0x4070547c <start_thread+1180>: ldi -b,r24 0x40705480 <start_thread+1184>: cmpb,=,n r24,r21,0x40705468 <start_thread+1160> 0x40705484 <start_thread+1188>: nop 0x40705488 <start_thread+1192>: ldi -2d,r25 0x4070548c <start_thread+1196>: cmpb,=,n r25,r21,0x40705468 <start_thread+1160> 0x40705490 <start_thread+1200>: nop 0x40705494 <start_thread+1204>: stw ret0,-1b8(sp) 0x40705498 <start_thread+1208>: sub r0,r21,r21 0x4070549c <start_thread+1212>: stw r21,-1b4(sp) 0x407054a0 <start_thread+1216>: ldw -1b4(sp),ret0 If my assumptions are correct, then we either could a) use the gcc atomic builtins instead of own atomic code in libc6: E.g: add to ports/sysdeps/unix/sysv/linux/hppa/bits/atomic.h: ... #if __GNUC_PREREQ (4, 1) # define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ __sync_val_compare_and_swap (mem, oldval, newval) # define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \ (! __sync_bool_compare_and_swap (mem, oldval, newval)) #elif __ASSUME_LWS_CAS .... b) change the assembly in atomic_compare_and_exchange_val_acq() to not put it's local variables (lws_errno and lws_ret) on the stack. I'm currently testing option a). Helge (PS: I used a webmailer, so the indenting might be strange...) -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html