Antw: Re: Antw: Re: Meaning of some objctl variables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>> Steven Dake <steven.dake@xxxxxxxxx> schrieb am 06.06.2015 um 06:34 in Nachricht
<CAPwfPsjaCm1_3RMAYh_L07sjnwTCfKe0CmksHNWXTp0rqMYrSA@xxxxxxxxxxxxxx>:
> On Wed, Jun 3, 2015 at 6:57 AM, Ulrich Windl <
> Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> >>> Jan Friesse <jfriesse@xxxxxxxxxx> schrieb am 03.06.2015 um 13:44 in
>> Nachricht
>> <556EE895.2040107@xxxxxxxxxx>:
>> > Ulrich Windl napsal(a):
>> >> (This is a re-send of 2015-05-15, because not the subscribe actually
>> worked)
>> >> Hello!
>> >>
>> >> I've meen monitoring some corosync objctl variables to find out what's
>> going
>> > on. I have some results, but don't really know what the variables are
>> saying.
>> > Maybe someone can comment on those; what do they mean?:
>> >>
>> >> runtime.totem.pg.mrp.srp.orf_token_rx increases about 142 per second
>> >> runtime.totem.pg.mrp.srp.memb_merge_detect_tx (and rx) increases about
>> 2 per
>> > second
>> >> runtime.totem.pg.mrp.srp.mcast_tx increases about 40 per second
>> >> runtime.totem.pg.mrp.srp.mcast_rx increaes by only 3 per second
>> >> runtime.totem.pg.mrp.srp.token_hold_cancel_tx (and rx) increases from 1
>> to 5
>> > per second
>> >> runtime.totem.pg.mrp.srp.mtt_rx_token varies from 0 to 24
>> >>
>> >> I wonder whether our configuration looks sane or not, and if not which
>> > parameters to change.
>> >
>> > You didn't send configuration. Configuration is stored in
>> > /etc/corosync/corosync.conf or /etc/cluster/cluster.conf.
>>
>> I deliberately skipped it to enforce a "blackbox view" on it.
>>
>> >
>> > But yes, increasing of rx_tx values is normal and expected.
>>
>> Yes, that was easy to guess even for me. But what about
>> "token_hold_cancel_tx" and "mtt_rx_token"?
>>
>>
> These are internal Totem operational indicators.  token_hold_cancel_tx is
> how many times a token hold cancel is sent.  On a lightly loaded totem
> network, the token would race around the ring.  We introduced a feature
> which would stop the token on a certain number of rotations and then delay
> it there for the "token_hold_timeout" period.  This would result in latency
> when a node actually wanted to send a message.  So we introduced token hold
> cancel message, which would tell the cluster to release the token so
> messages could be sent.

Great, I understand. What do you think about the idea writing a manual page (or adding to an existing manual page) to describe these parameters? This might save your the work of answering such questions in the future.

So without that mechanism the token would be rotated with maximum speed?
I also wonder: Usually the node with the token may send, right? So which node is allowed to send the "token_hold_cancel"? Does this introduce race conditions (the token could be released just at the time when another node sends the token_hold_cancel)?

> 
> rmtt_rx_token is an internal flow control variable and documented in great
> detail in the Totem specification.  It is difficult to explain to someone
> that hasn't read the Totem specification hundreds of times, but in essence

You got the point: I only read the specification twice, and not too thoroughly. ;-)

> it is part of the totem algorithm that keeps the flow of messages coming
> equally from each node as the token rotates around the network.  How
> precisely it does this could warrant a 5 page essay, so I'll spare you the
> details.

If my earlier documentation proposal finds your sympathy, you could add a reference to the TOTEM spec (assuming it has sections numbered properly for reference).

Finally: What is the official specification you are referring to? I wonder where to get the originals...
[AgMM00] Agarwal, D. A.; Melliar-Smith, P. M.; Moser, L. E.: Totem: A Protocol for Message Ordering in a Wide-Area Network

> 
> Hope these details help.

Yes thanks a lot.

Regards,
Ulrich

> 
> regards
> -steve
> 
> 
>> >
>> >>
>> >> corosync-1.4.7 of SLES11 running on a Xen paravirtualized host...
>> >
>> > I would suggest to ask SUSE, because their version may be different then
>> > upstream  one. Also this is why are you paying support to them, isn't it?
>>
>> I doubt they changed the fundamental meaning of the variables or the basic
>> protocol.
>> Or is it a polite way of saying "I don't like to help you"?
>>
>> If it calms you down: Actually I have open an issue with an odd
>> communication issue with SLES support for weeks.
>>
>> Regards,
>> Ulrich
>>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx 
>> http://lists.corosync.org/mailman/listinfo/discuss 
>>




_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux