Hi Jan, this happens both using "corosync-cfgtool -l" or a file in service.d. It seems that something hoses the threads internal data (TLS). According to gdb, the pointer (&conn_info->addr) passed to pthread_mutex_lock() (via %rdi) is correct. I added a syslog() statement before the call to pthread_mutex_lock() and found the program crashing in it. This happens because of libc´s internal synchronization for threaded programs, which also calls pthread_mutex_lock(). The crash happens here: (gdb) frame 0 #0 0x00007f7ff68078e9 in pthread_mutex_lock () from /usr/lib/libpthread.so.1 (gdb) x/5i pthread_mutex_lock 0x7f7ff68078e0 <pthread_mutex_lock>: mov %fs:0x0,%rax => 0x7f7ff68078e9 <pthread_mutex_lock+9>: mov 0x10(%rax),%rdx 0x7f7ff68078ed <pthread_mutex_lock+13>: xor %eax,%eax 0x7f7ff68078ef <pthread_mutex_lock+15>: lock cmpxchg %rdx,0x10(%rdi) 0x7f7ff68078f5 <pthread_mutex_lock+21>: test %rax,%rax (gdb) info reg fs rax rdi fs 0x0 0 rax 0x7f7ffffffffe 140187732541438 rdi 0x7f7ff738f050 140187585278032 (gdb) frame 1 #1 0x00007f7ff7002e14 in ipc_thread_active (conn=0x7f7ff738f000) at coroipcs.c:465 465 pthread_mutex_lock (&conn_info->mutex); (gdb) p &conn_info->mutex $2 = (pthread_mutex_t *) 0x7f7ff738f050 Probably not easy to fix... Regards, Stephan 2012/12/10 Jan Friesse <jfriesse@xxxxxxxxxx>: > Stephan, > is this happening only with pacemaker, or is this general problem (with > dynamically loading of plugins)? Can you test to load different plugin > in runtime (like one of openais one) or try to configure to load > pacemaker after start: > > service { > name: pacemaker > ver: 0 > } > > Regards, > Honza > > Stephan napsal(a): >> Hi all, >> >> now that Corosync 1.x (1.4.4 in this case) works on NetBSD (6.0 amd64) >> "out of the box", I compiled Pacemaker 1.0 and 1.1 and tried to run it >> on top of corosync. Unfortunately, when I load Pacemaker using >> "corosync-cfgtool -l pacemaker", corosync crashes with SIGSEGV. >> >> I already found this with gdb: >> >> -----8<-------- >> Core was generated by `corosync'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x00007f7ff68078e9 in pthread_mutex_lock () from /usr/lib/libpthread.so.1 >> (gdb) bt full >> #0 0x00007f7ff68078e9 in pthread_mutex_lock () from /usr/lib/libpthread.so.1 >> No symbol table info available. >> #1 0x00007f7ff7002e14 in ipc_thread_active (conn=0x7f7ff5308000) at >> coroipcs.c:465 >> conn_info = 0x7f7ff5308000 >> retval = 0 >> #2 pthread_ipc_consumer (conn=0x7f7ff5308000) at coroipcs.c:674 >> conn_info = 0x7f7ff5308000 >> header = <optimized out> >> coroipc_response_header = {size = 660260756, id = 5, error = 0} >> send_ok = <optimized out> >> new_message = <optimized out> >> sem_value = 0 >> #3 0x00007f7ff6809d75 in ?? () from /usr/lib/libpthread.so.1 >> No symbol table info available. >> #4 0x00007f7ff60759f0 in ___lwp_park50 () from /usr/lib/libc.so.12 >> No symbol table info available. >> Cannot access memory at address 0x7f7ff0000000 >> (gdb) frame 1 >> #1 0x00007f7ff7002e14 in ipc_thread_active (conn=0x7f7ff5308000) at >> coroipcs.c:465 >> 465 pthread_mutex_lock (&conn_info->mutex); >> (gdb) print &conn_info->mutex >> $1 = (pthread_mutex_t *) 0x7f7ff5308050 >> (gdb) p *$ >> $2 = {ptm_magic = 858980355, ptm_errorcheck = 0 '\000', ptm_pad1 = >> "\000\000", ptm_interlock = 0 '\000', ptm_pad2 = "\000\000", ptm_owner >> = 0x0, ptm_waiters = 0x0, ptm_recursed = 0, ptm_spare2 = 0x0} >> (gdb) frame 0 >> #0 0x00007f7ff68078e9 in pthread_mutex_lock () from /usr/lib/libpthread.so.1 >> (gdb) x/2i 0x00007f7ff68078e0 >> 0x7f7ff68078e0 <pthread_mutex_lock>: mov %fs:0x0,%rax >> => 0x7f7ff68078e9 <pthread_mutex_lock+9>: mov 0x10(%rax),%rdx >> (gdb) info reg rax rdx >> rax 0x7f7ffffffffe 140187732541438 >> rdx 0x0 0 >> (gdb) x/p 0x7f7ffffffffe >> 0x7f7ffffffffe: Cannot access memory at address 0x7f7ffffffffe >> ---------- >> >> -I think gdb tells us that there is a valid struct pthread_mutex_t in memory. >> -I think that 4 bytes are copied to the adress rax point to. In this >> case rax points to the last page in the stack segment, crossing the >> border to the next page, which is not mapped: >> >> 00007f7ffffe0000- >> 00007f7fffffffff 128k 0000000000000000 rw-p- >> (rwx) 1/0/0 00:00 0 - [ stack ] >> >> Any idea about this? >> >> Regards, >> >> Stephan >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss