"corosync-cfgtool -s" hangs for hours

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we are running a two-node cluster on SLES11 SP1 machines with
HA-Extension and the bundled Corosync+Pacemaker packages.
Corosync is version 1.1.5.

The machines are running for over a year now and despite some
problems we've had with our setup, we never before had the
problem we're now facing:

We're running a small monitoring script that checks the status of both
corosync rings every three minutes and submits the result via mail to
our monitoring server.
It basically wraps the output of "corosync-cfgtool -s" in an email.

Since a few days we see that the "corosync-cfgtool -s" call hangs for
multiple hours and during that time blocks all subsequent calls to
"corosync-cfgtool -s" (or "-r" for that matter) on that system:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 17963 0.0 0.0 14308 704 ? D 07:36 0:00 /usr/sbin/corosync-cfgtool -s root 18587 0.0 0.0 14308 704 ? D 07:39 0:00 /usr/sbin/corosync-cfgtool -s root 19363 0.0 0.0 14308 708 ? D 07:42 0:00 /usr/sbin/corosync-cfgtool -s root 20061 0.0 0.0 14308 704 ? D 07:45 0:00 /usr/sbin/corosync-cfgtool -s root 20751 0.0 0.0 14308 708 ? D 07:48 0:00 /usr/sbin/corosync-cfgtool -s root 21409 0.0 0.0 14308 708 ? D 07:51 0:00 /usr/sbin/corosync-cfgtool -s root 22106 0.0 0.0 14308 704 ? D 07:54 0:00 /usr/sbin/corosync-cfgtool -s root 22854 0.0 0.0 14316 716 ? D 07:57 0:00 /usr/sbin/corosync-cfgtool -s root 23634 0.0 0.0 14308 704 ? D 08:00 0:00 /usr/sbin/corosync-cfgtool -s root 24475 0.0 0.0 14308 708 ? D 08:03 0:00 /usr/sbin/corosync-cfgtool -s root 25250 0.0 0.0 14308 704 ? D 08:06 0:00 /usr/sbin/corosync-cfgtool -s

After a few hours the piled-up processes vanish and everything works as
expected again,
until it happens again. There's no problem executing "corosync-cfgtool
-s" on the other node.

Has anyone an idea what could cause this? We didn't change anything on
the system's configuration
and the problem just appeared out of the blue...
Also there's no hint in the logs.

Our corosync.conf looks like this:

----- snip -----
aisexec {
         group:  root
         user:   root
}
service {
         use_mgmtd:      yes
         ver:    0
         name:   pacemaker
}
totem {
         rrp_mode:       passive
         join:   100
         max_messages:   20
         vsftype:        none
         consensus:      10000
         secauth:        on
         token_retransmits_before_loss_const:    10
         threads:        16
         token:  10000
         version:        2
         interface {
                 bindnetaddr:    192.168.1.0
                 mcastaddr:      239.250.1.1
                 mcastport:      5405
                 ringnumber:     0
         }
         interface {
                 bindnetaddr:    194.55.223.0
                 mcastaddr:      239.250.1.2
                 mcastport:      5415
                 ringnumber:     1
         }
         clear_node_high_bit:    yes
}
logging {
         to_logfile:     no
         to_syslog:      yes
         debug:  off
         timestamp:      off
         to_stderr:      yes
         fileline:       off
         syslog_facility:        daemon
}
amf {
         mode:   disable
}
----- snip -----

--
Sebastian Kaps

--
Sebastian
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux