Re: Slightly [OT] Network Monitoring/Alerting tools

Les Mikesell <lesmikesell@xxxxxxxxx> · Fri, 22 Aug 2008 15:41:47 -0500

Adam Hough wrote:

Polling is the best way to know if a service is actually working, but
OpenNMS also listens for SNMP traps, syslog messages, or xmlrpc events if
you want to send things to it.

So I have seen polling systems fail to properly diagnose a s system(s)
when said system(s) would be under very heavy load (like say a process
using almost all available resources) but the they system was just to
slow to actually respond to the poll request.  The monitoring system
(Nagios) would mark the system as just down.  With a client/server
system it gives you a better chance of figuring out that you are
running out of memory then the polling system.

In practical terms, it only matters whether your service is responding 
in a reasonable amount of time or not.  It doesn't really matter if the 
server is still barely alive and some other code is still telling you it 
is OK.  OpenNMS lets you tune what a 'reasonable amount of time' is in 
the polling step and also before any notifications are sent.  Of course 
if your server is prone to running out of resources (cpu, memory, disk, 
bandwidth) you should also be collecting that data via snmp with 
thresholds to notify you before you have an outage or at least graphing 
it so you'll understand the trends.

OpenNMS from
what I can tell still does not give me the flexibility that I want or
need that I get from other system such as Hobbit (BB) or Nagios.
Example?  Stock SNMP will report most of the usual stuff (interface
bandwidth/errors, memory/disk/cpu use, etc.) and there are ways to extend it
to other values.

But when I was using Big Brother as a monitoring server we were able
to easily right scripts to extend information that was reported to the
monitoring server.  We were able to use to scripts to moniter database
operations for some of our users so they would know how many and what
ones were running.  We were able to use the monitor to look for
hardware problems (AIX/pseries) and dump the log of the hardware
reporting to the monitoring server.  We were able to monitor when
backups were running on the system and if they have been running for
an unusually long time.

You can do all of that with OpenNMS also, the real issue is how 
difficult it is.  Most of the common things you would want are already 
built in so you don't have to script them - or install scripts on all 
the clients.  Some of the less common things might be more difficult but 
the framework is there to extend.

See below as I have never tried to configure SNMP other then to get
the basic system information, but I think it would be much harder to
setup SNMP to do some of those tasks then just having to write a
simple script in bash, korn, perl, or python.

The server can run a script to poll something if you don't mind the 
efficiency hit - or you can run a script elsewhere and use the included 
send_event.pl script to send the result to the server.

Though I will admit I had not know all that much about snmp other then
to make sure that it is turned off on systems I install to give bots
one less attack point if they make it past my iptable rules in some
manner.
Don't turn read access off, just use a hard-to-guess community string.
Usually you would block inbound access at your internet firewalls anyway.

My machines live on a university network which are notoriously unsafe.
  Further more since I deal with systems devoted to research so I have
to allow (ssh) access to the machines from from other universities all
over the world.   I cannot trust my public networks and can only trust
my non-routeable networks to the extent that no user has used a easy
to guess password.  Coupled with the fact that SNMP had a history of
security issues though with SNMPv3 they have actually added security
from what I have read.

Can you name a service that does not have a history of security issues? 
 That means you update them to get the fixes, not that you stop using 
them. If I thought someone could get rich by reading my CPU usage I'd 
worry about it...  If your network is subject to sniffing between the 
server and targets, you might want v3, though.

Running SNMP just seems like an unnecessary
risk when you can have your monitored systems pushing data to the
monitor server(s) and just have to secure the monitor server(s).

The big advantages are that you can monitor the hosts, network 
equipment, UPS's, etc. with the same tool with the same notification 
setup and with a little extra configuration it can understand the 
topology for drawing maps and restricting notifications to a 
router/switch instead of the hundreds of now-unreachable hosts/services 
behind it.

--
  Les Mikesell
   lesmikesell@xxxxxxxxx

--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list