Re: overhead? send/receive cpg messages

Mike Rosenlof <Mike.Rosenlof@xxxxxxxxxxx> · Tue, 4 Mar 2014 22:02:21 +0000

________________________________________
From: Jan Friesse [jfriesse@xxxxxxxxxx]
Sent: Monday, March 03, 2014 2:30 AM
To: Mike Rosenlof; discuss@xxxxxxxxxxxx
Subject: Re:  overhead?  send/receive cpg messages

Mike Rosenlof napsal(a):
>
> Hi,
>
> I'm using corosync 1.4.1  on RHEL  6.2     I have a cluster of two nodes and we are passing node to node messages with the CPG message API  (cpg_model_initialize, cpg_join, cpg_dispatch, etc...)
>
> while the messaging is idle, if I run 'tcpdump' on the corosync source port  5404, we get three messages every couple of seconds::
>
> 14:47:41.384240 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:41.384693 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> 14:47:41.593953 IP g5se-48521a.hpoms-dps-lstn > 239.192.42.210.netsupport: UDP, length 119
>
> 14:47:43.288532 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:43.289109 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> 14:47:43.498385 IP g5se-48521a.hpoms-dps-lstn > 239.192.42.210.netsupport: UDP, length 119
>
> etc...
>
> Now when the node application sends a message to another node
>     stat=cpg_mcast_joined( commHandle.cpgHandle, CPG_TYPE_FIFO, &iov, 1 );
>
> tcpdump captures a surprisingly large number of packets
>
> 14:47:45.525430 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> 14:47:45.525576 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:45.526354 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> [snip, 59 length 107 messages deleted!]
> 14:47:45.540345 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:45.540603 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
>
>
> Does anybody have an idea why there are so many messages for corosync to transmit a message of approximately 12 bytes?
>
> thank you for any insight here...
>

[Jan Friesse ]
corosync rotates token between nodes as a heartbeat (few in seconds) and
when messages are sent, token must rotate more quickly. Token itself is
quite small.

To see actual messages, filter mcast packets.

Does this answer your question?

>

[me] not exactly.  I can see that there is a heartbeat going around the nodes while idle, what alarmed me was this part of the sequence:

> 14:47:45.525430 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> 14:47:45.525576 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:45.526354 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107
> [snip, 59 length 107 messages deleted!]
> 14:47:45.540345 IP g5se-48521a.hpoms-dps-lstn > g5se-ad665b.netsupport: UDP, length 107
> 14:47:45.540603 IP g5se-ad665b.hpoms-dps-lstn > g5se-48521a.netsupport: UDP, length 107

This is at a point where one node sent out a message (15 bytes) with cpg_mcast_joined() and (note the timestamps) in the course of less than 20msec there were over a hundred messages to support that one data packet.  That's the part that seems like a lot of overhead.

Is this a configuration issue?  It's a cluster of two nodes...

--mike

>
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
>

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss