Re: Endless "A processor joined or left the membership and a new membership" messages even with a single node

Jan Friesse <jfriesse@xxxxxxxxxx> · Thu, 30 Aug 2012 08:26:15 +0200

Jeremy Fitzhardinge napsal(a):
On 08/29/2012 03:33 AM, Jan Friesse wrote:
Jeremy Fitzhardinge napsal(a):
Using the needle-2.0 branch on github (which I assume corresponds to the
2.0.1 release, though I guess I forgot to confirm that), I'm seeing

It's 2.0.1 + additional (not yet released) fixes

endless "A processor joined or left the membership and a new membership
(192.168.169.183:329052) was formed." messages, even with just a single
node up.

Is this a problem, or expected?  At the very least, it seems like a lot
of noise.

This is problem and it's not expected

it also seems somewhat unstable when starting a second node; often the
corosync on the first node seems to quietly die, without really
indicating any particular problem in its log file.

This is also problem and not expected.

I'm sorry, this report was the result of a mis-applied local change.

Ok

BTW, would you be interested in a patch to make the nss dependency
optional (obviously, removing all the crypto aspects of corosync)?  We

I'm able to understand that for some systems it may be useful to NOT 
have NSS as dependency. On the other hand, production systems usually 
(like 99% of them) uses encrypted and signed messages. This is why we 
made NSS as HARD requirement.

For sure if you will send patch, I will at least review it, but 
honestly, I'm really unsure if also apply. Such patch must be extremely 
small and clean and number of changes must be minimized to almost zero. 
Also default compile must still be WITH nss, and auto-detection is 
rather no-no, so I would prefer something like --disable-nss.

need it locally because we're using it in an environment where other
dependencies are very awkward, and network security is handled at
another layer.

I also have other problems with endless retransmit lists on in 1.4.4
when a second node comes up, but I'll report those separately once I've
done a bit more investigation (it could be a problem with multicast on
that particular network, but that raises its own set of interesting
questions).

Corosync really depend on correctly configured multicast and amount of 
problems with incorrectly configured mcast is endless. This is also 
reason why we have UDPU.

Thanks,
     J

Regards,
  Honza

Can you please
I'm just using corosync's CPG API.

With debugging on, the full cycle of messages is below, using the
config:

totem {
          version: 2
          secauth: off
          nodeid: 2422866046
          threads: 0
          clear_node_high_bit: yes
          vsftype: none

          # Auto-set mcast from ringid
          cluster_name: 2f3b3e8cb73e5d3693b3439dc10acbaddf3a2533

          interface {
                  # The following values need to be set based on your
environment
                  ringnumber: 0
                  bindnetaddr: 192.168.169.0
                  # NOTE: corosync uses 2 ports: mcastport and
mcastport-1 !
                  #mcastaddr:
                  #mcastport: 2935
          }
}

logging {
          fileline: on
          to_stderr: yes
          to_logfile: no
          to_syslog: no
          syslog_facility: daemon
          timestamp: on
          debug: on
          logger_subsys {
                subsys: CPG
                debug: off
          }
}

I've tried same config file and got no unexpected behavior.

Can you please try to send corosync-blackbox (just run
corosync-blackbox and save /var/iib/corosync/fdata somewhere on the net)?

Any clues?

Thanks,
      J

Aug 26 06:14:35 debug   [SYNC  ] sync.c:232 Committing
synchronization for corosync cluster closed process group service v1.01
Aug 26 06:14:35 notice  [MAIN  ] main.c:273 Completed service
synchronization, ready to provide service.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2427 releasing messages
up to and including 1
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2427 releasing messages
up to and including 6
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1979 entering GATHER
state from 9.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3020 Creating commit
token because I am the rep.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1471 Saving state aru 6
high seq received 6
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3265 Storing new sequence
id for ring 5371c
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2035 entering COMMIT state.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4351 got commit token
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2072 entering RECOVERY
state.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2114 TRANS [0] member
192.168.169.183:
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2118 position [0] member
192.168.169.183:
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2122 previous ring seq
53718 rep 192.168.169.183
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2128 aru 6 high delivered
6 received flag 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2226 Did not need to
originate any messages in recovery.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4351 got commit token
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4404 Sending initial ORF
token
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
high seq received 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
high seq received 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
high seq received 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
high seq received 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3696 retrans flag count 4
token aru 0 install seq 0 aru 0 0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1487 Resetting old ring
state
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1693 recovery to regular 1-0
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1779 Delivering to app 7
to 6
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1905 entering OPERATIONAL
state.
Aug 26 06:14:35 notice  [TOTEM ] totemsrp.c:1909 A processor joined
or left the membership and a new membership (192.168.169.183:341788)
was formed.
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3768 Delivering 0 to 1
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 1 to pending delivery queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
added to pending queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3768 Delivering 1 to 6
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 2 to pending delivery queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 3 to pending delivery queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 4 to pending delivery queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 5 to pending delivery queue
Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
message with seq 6 to pending delivery queue
Aug 26 06:14:35 debug   [SYNC  ] sync.c:232 Committing
synchronization for corosync cluster closed process group service v1.01
Aug 26 06:14:35 notice  [MAIN  ] main.c:273 Completed service
synchronization, ready to provide service.

No interesting things seems to be in this part of log, interesting
things probably happen later.

Regards,
   Honza

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss