Re: coroipcs_ipc_service_exit() dead loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason,
thanks for analysis. It took me really quite a lot time to understand
WHAT is really happening, but I believe I've got it. I've created patch
"[PATCH] Free confdb message holder list on confdb exit". Can you please
give it try and paste results?

How was you able to hit that bug (I mean, do you have any reproducer?).

Regards,
  Honza

jason napsal(a):
> Sorry, in the previous mail, I didn't realize that
> after service_exit_schedwrk_handler() for confdb is done, the notify_pipe
> was closed, therefore, ipc_dispatch_send_from_poll_thread() won't increase
> conn->refcount.  But if below senario exists, dead loop still have chance
> to happen:
> 
> 1. confdb_notify_lib_of_key_change()/confdb_notify_lib_of_new_object()/...
> ( before objdb_notify_dispatch() )
> 2. service_exit_schedwrk_handler()
> 3. service_unlink_schedwrk_handler() //deadloop!
> 
> 
> 
> On Mon, Apr 22, 2013 at 10:29 PM, jason <huzhijiang@xxxxxxxxx> wrote:
> 
>> Hi All,
>>
>> I encountered a dead looping at the following code:
>>
>> coroipcs_ipc_service_exit() {
>> ...
>> while (conn_info_destroy (conn_info) != -1)
>>  ;
>> }
>>
>> It happend when confdb service side was notifying library side about key
>> changing(or object creating/destroying) while corosync is unloading. When
>> it happend, i saw conn_info->refcount =3, and it was a confdb IPC
>> connection.
>>
>> By analysing the code I found that there is a gap
>> between service_exit_schedwrk_handler()
>> and service_unlink_schedwrk_handler(), and if confdb service side calls
>> confdb_notify_lib_of_key_change() in this gap (triggered by some other
>> service), the conn_info->refcount will be increased
>> by ipc_dispatch_send_from_poll_thread(). Then, when we are in
>> coroipcs_ipc_service_exit(), dead loop will happen.
>>
>> And more, after service_exit_schedwrk_handler() for confdb is
>> done, objdb_notify_dispatch() is unregistered from poll, thus, there is no
>> more chance to decrease conn->refcount after this(even we somehow omit the
>> dead loop).
>>
>> Above is my conclusion only by code analysis. I haven't got any idea to
>> correct it , even not sure if it is the root cause of the dead loop. Please
>> help.
>>
>> --
>> Yours,
>> Jason
>>
> 
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux