Jason, again thanks for your very detailed analysis. What is pthread_create returning? EAGAIN? If so, I would probably go way "Instead of solving problem, just don't allow problem to appear". In other words, I would check res = pthread_create, and if it's EAGAIN, just clear conn_info variables like ref_count, state, private_data, ... and call ipc_disconnect. What do you think about that? Honza jason napsal(a): > Hi All, > > My enviroment(corosync-1.4.5) encountered a segmentation fault at the > following place. > > (gdb) bt > #0 0x004f9012 in pthread_join () from /lib/libpthread.so.0 > #1 0x00ba6956 in conn_info_destroy (fd=15, revent=17, context=0x8dd78a0) > at coroipcs.c:503 > #2 coroipcs_handler_dispatch (fd=15, revent=17, context=0x8dd78a0) > at coroipcs.c:1617 > #3 0x0804c63b in corosync_poll_handler_dispatch ( > handle=150346236434579456, fd=15, revent=17, context=0x8dd78a0) > at main.c:1105 > #4 0x00d7e994 in poll_run (handle=150346236434579456) at coropoll.c:513 > #5 0x0804d697 in main (argc=2, argv=0xbfd7ad54, envp=0xbfd7ad60) > at main.c:1874 > (gdb) f 1 > #1 0x00ba6956 in conn_info_destroy (fd=15, revent=17, context=0x8dd78a0) > at coroipcs.c:503 > 503 res = pthread_join (conn_info->thread, &retval); > (gdb) p conn_info->thread > $1 = 0 > > gdb shows that pthread_join tried to join an ipc consumer which does not > exist. The reason I found out is that coroipcs_handler_dispatch() failed to > create the thread and it did not check the return value of pthread_create() > which was failed due to out of memory. When this happen, ipc client side > saw ipc connection create successfully but all the subsequent ipc requests > was blocked and never return. So I CTRL+C to quit the client application to > close the ipc connection at the client side. At this time, server side > calls pthread_join and got the segmentation fault. > > The solution to the segmentation fault is simply checking if > conn_info->thread is zero conn_info_destroy(), if it is, then,we should > omit to call pthread_join() and decrease ipc's refcount (which increased in > coroipcs_handler_dispatch()). > > So I changed the conn_info_destroy() code to the following: > > if (conn_info->state == CONN_STATE_THREAD_REQUEST_EXIT) { > if (0 != conn_info->thread) { > res = pthread_join (conn_info->thread, &retval); > } else { > coroipcs_refcount_dec (conn_info); > } > conn_info->state = CONN_STATE_THREAD_DESTROYED; > return (0); > } > > > > But this solution is useless for the client ipc blocking problem, because > when the above code returns 0 to coropoll.c, it will get no chance for > coroipcs_handler_dispatch to be called again. > > Any ideas? > > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss