Shridhar Sahukar napsal(a): > Hi Honza, > > Thanks for the clarifications. > >> Just curios. Why are you doing that? Is there something you are missing >> in CPG? If so, can you please try to describe what is that? Because if >> that functionality is generally useful, we can consider implement it to >> CPG (or create new service). > There are some minor usage issues with CPG service, for example, it uses > pid as the memberId, but apps might want to specify their own memberId > to join the groups; but this is not the main reason for me to go on this > path. > > We are trying to use corosync in an unusual way. We are creating an > hierarchical cluster (cluster of chassis) that interact with each other. > We want to use corosync to form this cluster of clusters. So we would > want only one instance of corosync running per chassis, but we want apps > running on different blades of the chassis to be able to talk to the > daemon running on a single blade. This cannot be achieved unless we > change QB to support TCP as a transport and the cpg library can talk to > cpg service through TCP transport. > > I considered enhancing QB for this support, but it seems like a lot more > work. So the approach we decided to take is that; > - We already have an IPC library for our applications to talk to each > other. > - So we are converting one of our app as a corosync-service which > implements CPG functionality and can provide the same functionality to > our apps. Ok. Even I don't understand why not to have corosync on each node in that case. > >> Corosync is basically state machine with following states: Operational, >> Gather, Recovery. Sync service more or less add new state, syncing (even >> from totem perspective, it's Operational state). >> >> When corosync membership changed (new processor appeared or some >> processor(s) disappeared), corosync starts gathering, then recovery and >> hopefully ends in operational state. Then, sync_init is called (with >> list of processors). It is usually used for saving membership. Right >> after that, sync_process is called. Sync_process can return -1, when it >> wishes to be called again, or 0 if not. Sync_process is protected by >> barrier, so until ALL messages sent in sync_init/sync_process are >> delivered to ALL nodes, it doesn't continue. When ALL messages from ALL >> processors for given services are delivered, sync_activate is called >> (this is correct place where to inform IPC clients about membership >> change). Last one is sync_abort. This is called when new node appeared >> or some node(s) disappeared during sync_process. If that happen, whole >> sync process repeats (so sync_init is called, then sync_process, ...). >> > So does it mean that, any other messages sent by the services while sync > is happening, wont be sent until sync completes? > > For example; > > At T1 -- App sends a message for a CPG group > T2 -- CPG service receives the message and is about to send it over > totem ring to all the other services on the ring > T3 -- Before service could send the message over the ring, it > receives a sync_init/sync_process and it starts sending messages to > other peers as part of synchronization. > T4 -- While sync is in progress, it sends the message over the ring > CPG itself never sends message over ring. What is happening: - IPC client wants to send cpg message - test if no sync is in progress - lock - cpg sends message - unlock between lock and unlock sync cannot happen. When CPG service receives message from totem, it will never reply back thru totem. > So in the above scenario, does corosync delay sending the group messages > while service is sending synchronize messages? How does it differentiate > between these messages? It doesn't. > > The use case for the above message sequence is that; > > -- When a new node joins the ring and the CPG service is started on the > new node, all the other existing CPG services would want to let the new > node know about the existing groups and the group membership. So lets > say that CPG services are sending the updates to the new service about > their group membership and if they receive a groupLeave (for ex, the app > crashes), and they send a leave message to all the CPG services over > totem, we must ensure the proper ordering of the group updates and the > leave messages. Otherwise the services will go out of sync. > I didn't got question. Can you please try to come with some example? > Regards, > Shridhar > > Honza _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss