monitoring cluster

Pirabhu Raman <pirabhur@MPI-SoftTech.Com> · Tue, 24 Feb 2004 10:08:58 -0600 (CST)

Hi,

I am interested in monitoring cluster running linux for various failures
(hardware and software). I basically want to quantify the cluster for
different failures over period of a month or so.
For this purpose I need to periodically scan the
syslogd and klogd messages to determine the failures. But the issue is
that the volume of messages is quite large and I am not sure what I am
exactly looking for. If ppl in the list could post some of the major
error/panic/warning messages that I should parse for (to achieve my
objective detailed above), I would be very glad.

Thanks in advance,
Pirabhu

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/