Re: Endless "A processor joined or left the membership and a new membership" messages even with a single node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/29/2012 03:33 AM, Jan Friesse wrote:
> Jeremy Fitzhardinge napsal(a):
>> Using the needle-2.0 branch on github (which I assume corresponds to the
>> 2.0.1 release, though I guess I forgot to confirm that), I'm seeing
>
> It's 2.0.1 + additional (not yet released) fixes
>
>> endless "A processor joined or left the membership and a new membership
>> (192.168.169.183:329052) was formed." messages, even with just a single
>> node up.
>>
>> Is this a problem, or expected?  At the very least, it seems like a lot
>> of noise.
>>
>
> This is problem and it's not expected
>
>> it also seems somewhat unstable when starting a second node; often the
>> corosync on the first node seems to quietly die, without really
>> indicating any particular problem in its log file.
>>
>
> This is also problem and not expected.

I'm sorry, this report was the result of a mis-applied local change. 
BTW, would you be interested in a patch to make the nss dependency
optional (obviously, removing all the crypto aspects of corosync)?  We
need it locally because we're using it in an environment where other
dependencies are very awkward, and network security is handled at
another layer.

I also have other problems with endless retransmit lists on in 1.4.4
when a second node comes up, but I'll report those separately once I've
done a bit more investigation (it could be a problem with multicast on
that particular network, but that raises its own set of interesting
questions).

Thanks,
    J

>
> Can you please
>> I'm just using corosync's CPG API.
>>
>> With debugging on, the full cycle of messages is below, using the
>> config:
>>
>> totem {
>>          version: 2
>>          secauth: off
>>          nodeid: 2422866046
>>          threads: 0
>>          clear_node_high_bit: yes
>>          vsftype: none
>>
>>          # Auto-set mcast from ringid
>>          cluster_name: 2f3b3e8cb73e5d3693b3439dc10acbaddf3a2533
>>
>>          interface {
>>                  # The following values need to be set based on your
>> environment
>>                  ringnumber: 0
>>                  bindnetaddr: 192.168.169.0
>>                  # NOTE: corosync uses 2 ports: mcastport and
>> mcastport-1 !
>>                  #mcastaddr:
>>                  #mcastport: 2935
>>          }
>> }
>>
>> logging {
>>          fileline: on
>>          to_stderr: yes
>>          to_logfile: no
>>          to_syslog: no
>>          syslog_facility: daemon
>>          timestamp: on
>>          debug: on
>>          logger_subsys {
>>                subsys: CPG
>>                debug: off
>>          }
>> }
>>
>
> I've tried same config file and got no unexpected behavior.
>
> Can you please try to send corosync-blackbox (just run
> corosync-blackbox and save /var/iib/corosync/fdata somewhere on the net)?
>
>
>> Any clues?
>>
>> Thanks,
>>      J
>>
>> Aug 26 06:14:35 debug   [SYNC  ] sync.c:232 Committing
>> synchronization for corosync cluster closed process group service v1.01
>> Aug 26 06:14:35 notice  [MAIN  ] main.c:273 Completed service
>> synchronization, ready to provide service.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2427 releasing messages
>> up to and including 1
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2427 releasing messages
>> up to and including 6
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1979 entering GATHER
>> state from 9.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3020 Creating commit
>> token because I am the rep.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1471 Saving state aru 6
>> high seq received 6
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3265 Storing new sequence
>> id for ring 5371c
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2035 entering COMMIT state.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4351 got commit token
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2072 entering RECOVERY
>> state.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2114 TRANS [0] member
>> 192.168.169.183:
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2118 position [0] member
>> 192.168.169.183:
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2122 previous ring seq
>> 53718 rep 192.168.169.183
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2128 aru 6 high delivered
>> 6 received flag 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2226 Did not need to
>> originate any messages in recovery.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4351 got commit token
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:4404 Sending initial ORF
>> token
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
>> is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
>> high seq received 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
>> is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
>> high seq received 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
>> is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
>> high seq received 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3666 token retrans flag
>> is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3677 install seq 0 aru 0
>> high seq received 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3696 retrans flag count 4
>> token aru 0 install seq 0 aru 0 0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1487 Resetting old ring
>> state
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1693 recovery to regular 1-0
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1779 Delivering to app 7
>> to 6
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:1905 entering OPERATIONAL
>> state.
>> Aug 26 06:14:35 notice  [TOTEM ] totemsrp.c:1909 A processor joined
>> or left the membership and a new membership (192.168.169.183:341788)
>> was formed.
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3768 Delivering 0 to 1
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 1 to pending delivery queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:2304 mcasted message
>> added to pending queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3768 Delivering 1 to 6
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 2 to pending delivery queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 3 to pending delivery queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 4 to pending delivery queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 5 to pending delivery queue
>> Aug 26 06:14:35 debug   [TOTEM ] totemsrp.c:3837 Delivering MCAST
>> message with seq 6 to pending delivery queue
>> Aug 26 06:14:35 debug   [SYNC  ] sync.c:232 Committing
>> synchronization for corosync cluster closed process group service v1.01
>> Aug 26 06:14:35 notice  [MAIN  ] main.c:273 Completed service
>> synchronization, ready to provide service.
>>
>
> No interesting things seems to be in this part of log, interesting
> things probably happen later.
>
> Regards,
>   Honza
>

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux