Re: Coroipcc_dispath_get assert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,
Sorry. According to my schedule and the resource limitations in my office, I have no opportunity to test your patches these days. But I will do it as soon as I reacquire resource again.

在 2012-12-6 下午6:37,"Jan Friesse" <jfriesse@xxxxxxxxxx>写道:
Jason,
can you please try to apply patches "[PATCH 1/3] coroipc: Don't spin
when waiting on semaphore", "[PATCH 2/3] On places with POLLERR check
also POLLNVAL" and "[PATCH 3/3] Check socket_recv error code in
ipc_dispatch_get" and let me know if problem still exists?

By problem I mean not assert (I've removed that), but CS_ERR_LIBRARY
because FD was closed.

Honza

Jan Friesse napsal(a):
> Hi,
>
> jason napsal(a):
>> Hi Jan,
>> I haven't the patch you suggested(in order to catch it again). But I have
>> reproduced this issue and found something very strange for me:
>>
>> 1) corosync_dispath_get() got called by our CLM client after return form
>> polling the file descriptor of the IPC channel. But I don't think there is
>> any dispatch request generate from serverside because we have only one node
>> in cluster, no cluster membership change notification can happen.
>
> Are you polling on clm descriptor? If so, there is probably some
> dispatch request
>
>>
>> 2) the poll() in corosync_dispatch_get() returns 1 but the errno is not
>> zero, it is EINTR!  Then it run into socket_recv(). Can poll really act
>> like this? Or the errno belongs to the previous syscall?
>
> I believe errno really belongs to previous call. Or, there is another
> possibility. Are you using signals in your application? If so, aren't
> you calling some syscals in them?
>
>>
>> 3) in socket_recv(), the call to recvmsg() returns -1 and errno is EBADF.
>> So after returns to corosync_dispatch_get() , the assertion raised.
>
> This is really weird.
>
>> On Dec 3, 2012 5:34 PM, "jason" <huzhijiang@xxxxxxxxx> wrote:
>>
>>> Thanks Jan,
>>> I will give it a try and find out the initial reason why this issue
>>> appear.in my environment.
>>> On Dec 3, 2012 5:12 PM, "Jan Friesse" <jfriesse@xxxxxxxxxx> wrote:
>>>
>>>> Hi,
>>>> honestly I'm really unsure why this assert is there. Actually, it really
>>>> looks like thing which shouldn't be there at all. I would suggest a
>>>> patch, which simply:
>>>> - remove whole #if defined (as it doesn't seem to be needed)
>>>> - remove assert
>>>> - check if error is CS_OK and if not, goto error_put
>>>>
>>>> Regards,
>>>>   Honza
>>>>
>>>> jason napsal(a):
>>>>> Hi All,
>>>>> We have encountered an assertion at coroipc.c:925 and it seems hard to
>>>>> reproduce. According to the code of corosync-1.4.4 it means
>>>> socket_recv()
>>>>> did not return CS_OK as expected by coroipcc_dispatch_get(). But I
>>>> checked
>>>>> socket_recv() and found that it DO return CS_ERR_TRY_AGAIN or
>>>>> CS_ERR_LIBRARY in some cases. So does it really need this assertion  or
>>>> do
>>>>> we need to deal with ! CS_OK cases at coroipc.c:925?
>>>>>
>>>>> Thank you!
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list
>>>>> discuss@xxxxxxxxxxxx
>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>>
>>
>

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux