Re: Bug when start/stop openais repeatedly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lidong,
thanks for patch. Can you please send me your analysis? I would really
like to understand root case, so now this patch helps.

Regards,
  Honza

Lidong Zhong napsal(a):
> Hi,
>    We built a cluster consist of three nodes and start/stop one of these nodes repeatedly. The test script is shown 
> like this:
>   1 #!/bin/sh                                                                   
>   2 
>   3 LOOP_COUNT=1000
>   4 
>   5 while [ $LOOP_COUNT -gt 0 ];
>   6 do
>   7     let "LOOP_COUNT-=1"
>   8     echo "test No. $((1000-LOOP_COUNT))"
>   9     rcopenais start
>  10     sleep 30
>  11     rcopenais stop
>  12     sleep 10
>  13 done
> 
> The error log looks like:
> Apr  3 11:35:56 hex-3 ocfs2_controld[3623]: Unable to open checkpoint "ocfs2:controld": Object does not exist
> Apr  3 11:35:56 hex-3 ocfs2_controld[3623]: Unable to open checkpoint "ocfs2:controld": Object does not exist
> Several times after this error appears first, it leads to this node being fenced.
> After some analysis, we think there is a race condition between corosync and openais CKPT service. So we formed 
> a patch which can avoid this problem effectively.
> The patch is attached below. Any review is highly appreciated.
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux