Greetings All,
running RHEL4U5
I have a bunch of services on my cluster w/ access via redundant
directors.
I've created a generic service checking script, which I'm specifying in
lvs.cf's 'send_program' config parameter.
script is attached to this post. see that for how it works with the
symlinks described below.
I create symlinks to the script for every service I want to check, with
their name containing the port to hit, as in:
/sbin/lvs-<port>.sh
so the symlink name to check ssh availability, for instance, is:
/sbin/lvs-22.sh
The script works fine, and returns the first contiguous block of
[[:alnum:]] text data from the connection attempt for use with the
expect line of lvs.cf.
The problem is, when nanny is spawned by pulse, all of the nanny
processes segfault.
> Nov 13 14:40:44 kop-sds-dir-01 lvs[17740]: create_monitor for ssh_access/kop-sds-01 running as pid 17749
> Nov 13 14:40:44 kop-sds-dir-01 nanny[17749]: making 10.32.12.11:22 available
> Nov 13 14:40:44 kop-sds-dir-01 kernel: nanny[17749]: segfault at 000000000000006c rip 000000335e570810 rsp 0000007fbfffe978 error 4
this occurs almost instantly for every nanny process.
Can anyone venture a guess as to what is happening?
Try running nanny manually in foreground - see if you get any error messages. RHEL5 nanny (0.8.4) has a bug where it segfaults on printing syslog log messages longer than 80 characters. Could be that. The patch is below.
--- util.new 2007-10-10 13:27:43.000000000 -0700
***************
*** 49,55 ****
while (1)
{
! ret = vsnprintf (buf, bufLen, format,
args);
if ((ret > -1) && (ret < bufLen))
{
break;
--- 49,58 ----
while (1)
{
! va_list try_args;
!
va_copy(try_args, args);
! ret = vsnprintf (buf, bufLen, format,
try_args);
! va_end(try_args);
if ((ret > -1) &&
(ret < bufLen))
{
break;
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster