Jason, can you please try to apply patches "[PATCH 1/3] coroipc: Don't spin when waiting on semaphore", "[PATCH 2/3] On places with POLLERR check also POLLNVAL" and "[PATCH 3/3] Check socket_recv error code in ipc_dispatch_get" and let me know if problem still exists? By problem I mean not assert (I've removed that), but CS_ERR_LIBRARY because FD was closed. Honza Jan Friesse napsal(a): > Hi, > > jason napsal(a): >> Hi Jan, >> I haven't the patch you suggested(in order to catch it again). But I have >> reproduced this issue and found something very strange for me: >> >> 1) corosync_dispath_get() got called by our CLM client after return form >> polling the file descriptor of the IPC channel. But I don't think there is >> any dispatch request generate from serverside because we have only one node >> in cluster, no cluster membership change notification can happen. > > Are you polling on clm descriptor? If so, there is probably some > dispatch request > >> >> 2) the poll() in corosync_dispatch_get() returns 1 but the errno is not >> zero, it is EINTR! Then it run into socket_recv(). Can poll really act >> like this? Or the errno belongs to the previous syscall? > > I believe errno really belongs to previous call. Or, there is another > possibility. Are you using signals in your application? If so, aren't > you calling some syscals in them? > >> >> 3) in socket_recv(), the call to recvmsg() returns -1 and errno is EBADF. >> So after returns to corosync_dispatch_get() , the assertion raised. > > This is really weird. > >> On Dec 3, 2012 5:34 PM, "jason" <huzhijiang@xxxxxxxxx> wrote: >> >>> Thanks Jan, >>> I will give it a try and find out the initial reason why this issue >>> appear.in my environment. >>> On Dec 3, 2012 5:12 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote: >>> >>>> Hi, >>>> honestly I'm really unsure why this assert is there. Actually, it really >>>> looks like thing which shouldn't be there at all. I would suggest a >>>> patch, which simply: >>>> - remove whole #if defined (as it doesn't seem to be needed) >>>> - remove assert >>>> - check if error is CS_OK and if not, goto error_put >>>> >>>> Regards, >>>> Honza >>>> >>>> jason napsal(a): >>>>> Hi All, >>>>> We have encountered an assertion at coroipc.c:925 and it seems hard to >>>>> reproduce. According to the code of corosync-1.4.4 it means >>>> socket_recv() >>>>> did not return CS_OK as expected by coroipcc_dispatch_get(). But I >>>> checked >>>>> socket_recv() and found that it DO return CS_ERR_TRY_AGAIN or >>>>> CS_ERR_LIBRARY in some cases. So does it really need this assertion or >>>> do >>>>> we need to deal with ! CS_OK cases at coroipc.c:925? >>>>> >>>>> Thank you! >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> discuss mailing list >>>>> discuss@xxxxxxxxxxxx >>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>> >>>> >>>> >> > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss