Ok. Even I don't understand why not to have corosync on each node in
that case.
AFAIK, Totem protocol's performance is directly proportional to the size
of ring (number of nodes participating in the ring).
- The token loss timeout is calculated based on the number of nodes in
the ring. As the number of nodes increases, the timeout also increases,
so it reduces the failure detection and recovery time.
- The messaging delay increases with the number of nodes in the ring.
We have 12 blades per chassis, so the if we run corosync on each blade,
the ring size would be 12 * <number of chassis> !
So it would reduce the performance.
CPG itself never sends message over ring. What is happening:
- IPC client wants to send cpg message
- test if no sync is in progress
- lock
- cpg sends message
- unlock
between lock and unlock sync cannot happen. When CPG service receives
message from totem, it will never reply back thru totem.
So in the above scenario, does corosync delay sending the group messages
while service is sending synchronize messages? How does it differentiate
between these messages?
It doesn't.
The use case for the above message sequence is that;
-- When a new node joins the ring and the CPG service is started on the
new node, all the other existing CPG services would want to let the new
node know about the existing groups and the group membership. So lets
say that CPG services are sending the updates to the new service about
their group membership and if they receive a groupLeave (for ex, the app
crashes), and they send a leave message to all the CPG services over
totem, we must ensure the proper ordering of the group updates and the
leave messages. Otherwise the services will go out of sync.
I didn't got question. Can you please try to come with some example?
As per my understanding, the corosync services would process any client
request only when they receive it back over the ring. i.e.
- Client sends a request
- The request goes to the corresponding service on the local corosync
daemon on the same node.
- The service broadcasts the request over the ring, so that all the
other services receive the request. The request message is also received
back by the sender as well.
- When each of the services receive the above request through the ring,
they then process the request.
For example, the cpg_join() request from the client goes to the cpg
service on the same node through qb IPC interface. The cpg service that
receives the "cpg_join" request from the client, broadcasts this request
over the ring to all the CPG services on the ring. So every cpg service
on the ring (including the sender) will receive this message back in
msgDelivery callback, and then they process the join request. i.e. they
add the new member to the group and send out the configChange callback
to each of the members in the ring.
So lets consider the following scenario:
- An application makes invokes cpg_leave() API.
- The request goes to local cpg_service, which broadcasts this message
to all the other cpg services by invoking cpg_node_joinleave_send. Note
that the member will be removed from the group data structure only when
all the services receive this message back through the message delivery
callback.
- Meanwhile, if another node joins and a sync is invoked, it is possible
that the nodes send the sync message including the member whose leave
request is begin processed (the request is received, and sent over the
ring, but haven't been received back yet).
- So the order of messages received by the new node could be;
-- It receives leave message for the member
-- Then it receives sync message from other nodes
In the above sequence, the cpg_service on the new node, does not know
about the member when it receives the leave message first, so it might
just discard that message. It then receives sync and updates its data
structure with the member who has already left.
The other nodes would receive the leave request and remove the member
from their data structure, while the new node will keep the member as it
missed the leave message! So it would lead to inconsistent view of the
groups across the nodes. Isn't it?
Regards,
Shridhar
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss