Re: startup considerations for v 2.x

Steven Dake <sdake@xxxxxxxxxx> · Thu, 05 Apr 2012 08:20:13 -0700

On 04/05/2012 08:11 AM, dan clark wrote:
> The corosync daemon v 1.99.9 when misconfigured using an older
> configuration file may have problems on startup.  If the value
> "rrp_mode: active" is added to the example configuration the back trace
> below is easily recreated, but perhaps this has already been addressed. 
> Without the "rrp_mode:" token in the example file the daemon started
> up.  In addition, the deamon does not seem to start with files valid in
> previous releases in uidgid.d directory. 
> 
> Perhaps the parsing code could be augmented to identify the new
> functionality requirements and default to reasonable values to aid in
> migration from older configuration files to newer files?   Would an
> upgrade path be to install the new software and run on older

That was the plan - perhaps backwards compatibility file testing wasn't
sufficient...

> configuration files be a reasonable requirement?   Is it important to
> use the "QUORUM" subsystem for logging and should it be used in older
> releases?
> 
> I apologize in advance if these issues were already discussed.
> 
> dan
> 
> 
> a) tried with a single uidgid.d file (see below).  Is there a change in
> the format of this file?
> [root@tarn exec]# /usr/sbin/corosync -f
> notice  [MAIN  ] Corosync Cluster Engine ('1.99.9'): started and ready
> to provide service.
> info    [MAIN  ] Corosync built-in features:
> error   [MAIN  ] uidgid: Only uid and gid are allowed items
> error   [MAIN  ] Corosync Cluster Engine exiting with status 8 at
> main.c:1078.
> 
> % cat /etc/corosync/uidgid.d/auser
> uidgid {
>     uid: auser
>     gid: auser
> }
> b) tried removing all uidgid.d files and received the following crash:
> (no interfaces defined for the stats structure)
> (gdb) run -f
> Starting program: /local/dclark/Downloads/
> corosync-1.99.9/exec/corosync -f
> [Thread debugging using libthread_db enabled]
> notice  [MAIN  ] Corosync Cluster Engine ('1.99.9'): started and ready
> to provide service.
> info    [MAIN  ] Corosync built-in features:
> [New Thread 0x7ffff6955700 (LWP 27007)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> active_instance_initialize (rrp_instance=0x740cf0, interface_count=1)
>     at totemrrp.c:1272
> 1272            stats_set_interface_faulty (rrp_instance, i, 0);
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.47.el6_2.9.x86_64 nspr-4.8.9-3.el6_2.x86_64
> nss-3.13.1-7.el6_2.x86_64 nss-util-3.13.1-3.el6_2.x86_64
> zlib-1.2.3-25.el6.x86_64
> (gdb) where
> #0  active_instance_initialize (rrp_instance=0x740cf0, interface_count=1)
>     at totemrrp.c:1272
> #1  0x00007ffff7dce2ea in totemrrp_algorithm_set (poll_handle=0x6f7870,
>     rrp_context=0x7ffff5f4a380, totem_config=0x7fffffffde60,
>     stats=<value optimized out>, context=0x7ffff5f18010,
>     deliver_fn=0x7ffff7dceb60 <main_deliver_fn>,
>     iface_change_fn=0x7ffff7dd1530 <main_iface_change_fn>,
>     token_seqid_get=0x7ffff7dce370 <main_token_seqid_get>,
>     msgs_missing=0x7ffff7dce390 <main_msgs_missing>,
>     target_set_completed=0x7ffff7dcfc00 <target_set_completed>)
>     at totemrrp.c:1694
> #2  totemrrp_initialize (poll_handle=0x6f7870, rrp_context=0x7ffff5f4a380,
>     totem_config=0x7fffffffde60, stats=<value optimized out>,
>     context=0x7ffff5f18010, deliver_fn=0x7ffff7dceb60 <main_deliver_fn>,
>     iface_change_fn=0x7ffff7dd1530 <main_iface_change_fn>,
>     token_seqid_get=0x7ffff7dce370 <main_token_seqid_get>,
>     msgs_missing=0x7ffff7dce390 <main_msgs_missing>,
>     target_set_completed=0x7ffff7dcfc00 <target_set_completed>)
>     at totemrrp.c:1887
> #3  0x00007ffff7dd0ef7 in totemsrp_initialize (poll_handle=0x6f7870,
>     srp_context=0x7ffff7fe6e88, totem_config=0x7fffffffde60,
> stats=0x6ff3e0,
>     deliver_fn=0x7ffff7dd7aa0 <totemmrp_deliver_fn>,
>     confchg_fn=<value optimized out>) at totemsrp.c:934
> ---Type <return> to continue, or q <return> to quit---
> #4  0x00007ffff7dd8dc7 in totempg_initialize (poll_handle=0x6f7870,
>     totem_config=0x7fffffffde60) at totempg.c:757
> #5  0x0000000000417f70 in main (argc=<value optimized out>,
>     argv=<value optimized out>, envp=<value optimized out>) at main.c:1172
> (gdb) p *rrp_instance
> $1 = {poll_handle = 0x0, interfaces = 0x0, rrp_algo = 0x7ffff7fe1a40,
>   context = 0x0, status = {0x0, 0x0}, totemrrp_deliver_fn = 0,
>   totemrrp_iface_change_fn = 0, totemrrp_token_seqid_get = 0,
>   totemrrp_target_set_completed = 0, totemrrp_msgs_missing = 0,
>   totemrrp_log_level_security = 0, totemrrp_log_level_error = 0,
>   totemrrp_log_level_warning = 0, totemrrp_log_level_notice = 0,
>   totemrrp_log_level_debug = 0, totemrrp_subsys_id = 0,
>   totemrrp_log_printf = 0, net_handles = 0x0, rrp_algo_instance = 0x0,
>   interface_count = 0, processor_count = 0, my_nodeid = 0,
>   totem_config = 0x7fffffffde60, deliver_fn_context = {0x0, 0x0},
>   timer_active_test_ring_timeout = {0, 0}, stats = {hdr = {is_dirty = 0,
>       last_updated = 0}, net = 0x0, algo_name = 0x0, faulty = 0x0,
>     interface_count = 0}}
> (gdb)
> 
> Here is the new example configuration file modified with the rrp_mode to
> cause the crash
> 
> 
> # cat corosync.conf
> # Please read the corosync.conf.5 manual page
> totem {
>     version: 2
>     rrp_mode: active
> 
>     # cypto_cipher and crypto_hash: Used for mutual node authentication.
>     # If you choose to enable this, then do remember to create a shared
>     # secret with "corosync-keygen".
>     crypto_cipher: none
>     crypto_hash: none
> 
>     # interface: define at least one interface to communicate
>     # over. If you define more than one interface stanza, you must
>     # also set rrp_mode.
>     interface {
>                 # Rings must be consecutively numbered, starting at 0.
>         ringnumber: 0
>         # This is normally the *network* address of the
>         # interface to bind to. This ensures that you can use
>         # identical instances of this configuration file
>         # across all your cluster nodes, without having to
>         # modify this option.
>         bindnetaddr: 10.109.20.0
>         # However, if you have multiple physical network
>         # interfaces configured for the same subnet, then the
>         # network address alone is not sufficient to identify
>         # the interface Corosync should bind to. In that case,
>         # configure the *host* address of the interface
>         # instead:
>         # bindnetaddr: 192.168.1.1
>         # When selecting a multicast address, consider RFC
>         # 2365 (which, among other things, specifies that
>         # 239.255.x.x addresses are left to the discretion of
>         # the network administrator). Do not reuse multicast
>         # addresses across multiple Corosync clusters sharing
>         # the same network.
>         # mcastaddr: 239.255.1.1
>         mcastaddr: 239.192.105.99
>         # Corosync uses the port you specify here for UDP
>         # messaging, and also the immediately preceding
>         # port. Thus if you set this to 5405, Corosync sends
>         # messages over UDP ports 5405 and 5404.
>         mcastport: 5405
>         # Time-to-live for cluster communication packets. The
>         # number of hops (routers) that this ring will allow
>         # itself to pass. Note that multicast routing must be
>         # specifically enabled on most network routers.
>         ttl: 1
>     }
> }
> 
> logging {
>     # Log the source file and line where messages are being
>     # generated. When in doubt, leave off. Potentially useful for
>     # debugging.
>     fileline: off
>     # Log to standard error. When in doubt, set to no. Useful when
>     # running in the foreground (when invoking "corosync -f")
>     to_stderr: yes
>     # Log to a log file. When set to "no", the "logfile" option
>     # must not be set.
>     to_logfile: yes
>     logfile: /var/log/cluster/corosync.log
>     # Log to the system log daemon. When in doubt, set to yes.
>     to_syslog: yes
>     # Log debug messages (very verbose). When in doubt, leave off.
>     debug: off
>     # Log messages with time stamps. When in doubt, set to on
>     # (unless you are only logging to syslog, where double
>     # timestamps can be annoying).
>     timestamp: on
>     logger_subsys {
>         subsys: QUORUM
>         debug: off
>     }
> }
> 
> 
> Note, is an example of an older configuration file.
> #
> # configuration for corosync
> # Please read the corosync.conf.5 manual page
> #
> compatibility: whitetank
> 
> totem {
>         version: 2
>         rrp_mode: active
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 10.0.0.0
>                 mcastaddr: 239.192.109.99
>                 mcastport: 5407
>         }
> }
> 
> logging {
>         timestamp: on
>         fileline: on
>         function_name: on
>         to_stderr: yes
>         to_logfile: no
>         to_syslog: yes
>         logfile: /var/log/corosync
>         # logfile_priority - alert|crit|debug|emerg|err|info|notice|warning
> #       logfile_priority: info
>         # syslog_facility - daemon, local0, ... local7
> #       syslog_priority: info
>         debug: off
>         trace: none|enter|leave|trace1|trace2|trace3
>         logger_subsys {
>                 subsys: AMF
>                 debug: off
>         }
> }
> 
> amf {
>         mode: disabled
> }
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss