Hi All,
I think I am noticing what appears to be an
anomaly, so just posting here for sanity check.
We have a stress test that does frequent
network partitioning/reunification to exercise
code related to node fail-back. We're based on
version 1.4.6. We were were based on 2.x.x until
libqb made its way into the core of corosync. As
our company policy precludes us from using
anything but BSD style licensed third part
source code, we had to either rewrite libqb or
go back.
Lets's say we have four nodes A, B, C and D.
A and B are one side of a network segment and C
and D on the other. The network can be
partitioned by pulling a cable connecting the
two segments.
When there is a configuration change, we need
to re-compute application state by sending
messages to the new members. Such a message
identifies the originating node and the size of
the cluster at the time. And this message is
logged in application log.
When the cluster goes from A,B,C,D to (A,B)
and (C, D), on A-B side, we see message from A
that says "From A, cluster size is 2".
Immediately thereafter there's another config.
chage to take the cluster back to (A, B, C, D).
Now we see messages from A, C and D that the
cluster size is 4. But we see two messages from
B, the first one says the cluster size is 2 and
the second one says it's 4. It appears that the
message from B when the cluster size was 2,
could not be delivered as there was a config.
change right on its heel, but it's being
delivered to a configuration different from the
one where it originated. Is this expected
behaviour ?
Messages are originated by the totem protocol and
ordered according to EVS when they are taken off the
new message queue and transmitted into the network.
This is different then queing a message (via cpg),
which is not origination. Are you sure your not
confusing origination with cpg_mcast?
Generally the correct way for an application to
behave according to EVS is to originate all state
change messages via the protocol, and act on them
when received. Some devs tend to change state when
they used cpg_mcast rather then change state when a
message is delivered. This would result in your
example behavior.
Just to clarify, your application only changes state
on delivery of a message to the cpg application (not
on queueing via cpg_mcast)?