Lidong, thanks for patch. Can you please send me your analysis? I would really like to understand root case, so now this patch helps. Regards, Honza Lidong Zhong napsal(a): > Hi, > We built a cluster consist of three nodes and start/stop one of these nodes repeatedly. The test script is shown > like this: > 1 #!/bin/sh > 2 > 3 LOOP_COUNT=1000 > 4 > 5 while [ $LOOP_COUNT -gt 0 ]; > 6 do > 7 let "LOOP_COUNT-=1" > 8 echo "test No. $((1000-LOOP_COUNT))" > 9 rcopenais start > 10 sleep 30 > 11 rcopenais stop > 12 sleep 10 > 13 done > > The error log looks like: > Apr 3 11:35:56 hex-3 ocfs2_controld[3623]: Unable to open checkpoint "ocfs2:controld": Object does not exist > Apr 3 11:35:56 hex-3 ocfs2_controld[3623]: Unable to open checkpoint "ocfs2:controld": Object does not exist > Several times after this error appears first, it leads to this node being fenced. > After some analysis, we think there is a race condition between corosync and openais CKPT service. So we formed > a patch which can avoid this problem effectively. > The patch is attached below. Any review is highly appreciated. > Thanks > > > > > > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss