Re: Questions about sync callbacks in the services

Shridhar Sahukar <ssahukar@xxxxxxxxx> · Wed, 10 Apr 2013 13:48:44 -0400

    On 04/10/2013 11:39 AM, Shridhar
      Sahukar wrote:

        Ok. Even I don't understand why not to have corosync on each node in
that case.

      AFAIK, Totem protocol's performance is directly proportional to the size 
of ring (number of nodes participating in the ring).
- The token loss timeout is calculated based on the number of nodes in 
the ring. As the number of nodes increases, the timeout also increases, 
so it reduces the failure detection and recovery time.
- The messaging delay increases with the number of nodes in the ring.

We have 12 blades per chassis, so the if we run corosync on each blade, 
the ring size would be 12 * <number of chassis> !

So it would reduce the performance.

    Also, I just saw the discussion on "Re:  Questions about
    sync callbacks in the services", which says;

      Officially supported are 16 nodes, but success are reported for 32 and
even 64 nodes. 64 nodes is maximum (I'm really not sure exactly WHY this
limit exists)

    That means, we can hardly have a cluster of not more than 5 chassis.
    This wont work for us!

    Regards,

    Shridhar

        CPG itself never sends message over ring. What is happening:
- IPC client wants to send cpg message
- test if no sync is in progress
- lock
- cpg sends message
- unlock

between lock and unlock sync cannot happen. When CPG service receives
message from totem, it will never reply back thru totem.

          So in the above scenario, does corosync delay sending the group messages
while service is sending synchronize messages? How does it differentiate
between these messages?

        It doesn't.

          The use case for the above message sequence is that;

-- When a new node joins the ring and the CPG service is started on the
new node, all the other existing CPG services would want to let the new
node know about the existing groups and the group membership. So lets
say that CPG services are sending the updates to the new service about
their group membership and if they receive a groupLeave (for ex, the app
crashes), and they send a leave message to all the CPG services over
totem, we must ensure the proper ordering of the group updates and the
leave messages. Otherwise the services will go out of sync.

        I didn't got question. Can you please try to come with some example?

      As per my understanding, the corosync services would process any client 
request only when they receive it back over the ring. i.e.
- Client sends a request
- The request goes to the corresponding service on the local corosync 
daemon on the same node.
- The service broadcasts the request over the ring, so that all the 
other services receive the request. The request message is also received 
back by the sender as well.
- When each of the services receive the above request through the ring, 
they then process the request.

For example, the cpg_join() request from the client goes to the cpg 
service on the same node through qb IPC interface. The cpg service that 
receives the "cpg_join" request from the client, broadcasts this request 
over the ring to all the CPG services on the ring. So every cpg service 
on the ring (including the sender) will receive this message back in 
msgDelivery callback, and then they process the join request. i.e. they 
add the new member to the group and send out the configChange callback 
to each of the members in the ring.

So lets consider the following scenario:

- An application makes invokes cpg_leave() API.
- The request goes to local cpg_service, which broadcasts this message 
to all the other cpg services by invoking cpg_node_joinleave_send. Note 
that the member will be removed from the group data structure only when 
all the services receive this message back through the message delivery 
callback.
- Meanwhile, if another node joins and a sync is invoked, it is possible 
that the nodes send the sync message including the member whose leave 
request is begin processed (the request is received, and sent over the 
ring, but haven't been received back yet).
- So the order of messages received by the new node could be;
     -- It receives leave message for the member
     -- Then it receives sync message from other nodes

In the above sequence, the cpg_service on the new node, does not know 
about the member when it receives the leave message first, so it might 
just discard that message. It then receives sync and updates its data 
structure with the member who has already left.

The other nodes would receive the leave request and remove the member 
from their data structure, while the new node will keep the member as it 
missed the leave message! So it would lead to inconsistent view of the 
groups across the nodes. Isn't it?

Regards,
Shridhar

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
.

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss