On Wed, 2006-09-13 at 09:51 +0200, Alain Moulle wrote: > >> The self-watchdog patch adds a process which monitors the "real" > >> clurgmgrd. The monitoring process should be the lower-numbered PID > >> (it's the parent of the one doing the work). > > >> The monitoring process watches for crash signals (SIGBUS, SIGSEGV, > >> etc.), and will simply exit if you kill the child with SIGKILL. > > >> So, basically, killing the higher-numbered PID with something like > >> SIGSEGV should cause the node to reboot. > > >> -- Lon > > Thanks Lon, I understand. > And if I kill -9 (SIGKILL) the higher-numbered PID at test purpose, > is it expected to reboot or not ? > > I see in code : > case SIGCHLD: > case SIGILL: > case SIGFPE: > case SIGSEGV: > case SIGBUS: > setup_signal(i, SIG_DFL); > break; > default: > setup_signal(i, signal_handler); > but can't conclude for a SIGKILL on higher-numbered PID process ... No, sigkill will just cause the watchdog to commit suicide: if (waitpid(child, &status, 0) <= 0) continue; if (WIFEXITED(status)) exit(WEXITSTATUS(status)); if (WIFSIGNALED(status)) { if (WTERMSIG(status) == SIGKILL) { clulog(LOG_CRIT, "Watchdog: Daemon killed, exiting\n"); raise(SIGKILL); Use something like SIGSEGV (e.g. to simulate a crash) and the nanny/watchdog process should reboot the node. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster