a modern-day SNMP use

"David W. Hankins" <David_Hankins@xxxxxxx> · Fri, 24 Oct 2008 09:26:37 -0700

On Thu, Oct 23, 2008 at 09:41:49AM -0700, Randy Presuhn wrote:
> All very good reasons why doing blind, single-variable MIB walks
> makes no sense.   Are there any commercial products that
> do this routinely?  I'm not aware of any.

This is a tangential issue, so I wanted to reply separately, and
later.

I worked for a backbone company a few years ago, I started there
around the turn of the most recent century, which used SNMP for
billing-related data.  When I started, I would say their system
was a nightmare (and they knew it).  Customers were even in the
know, and would routinely dispute their bills knowing full well
the'd get free net.

The short answer to your question, is that there exists today at
least one monitoring package, albeit not commercially nor even freely
available to the public (it is a private tool), which autodetects
all the interfaces and other monitorable variables on every router
in that backbone's network.  The only thing it knows about what it
is going to monitor, in advance, is the hostnames and community
string(s) of the devices it seeks to monitor.

It does this autodetect run once every 30 minutes, or when it restarts
or reconfigs, feeding the results of that search into a table of
'monitored OIDs', which expire out of the table after 90 minutes (or
if it gets an SNMP error indicating future queries would not be
fruitful).  If any device's sysUpTime.0 decreases (sysUpTime was
put at the top of every SNMP GET packet), it alone is re-
autoconfigured.

I would call that 'routine', but again this is fairly subjective.

The monitored OID dataset would be queried once every 30 seconds
(except the Catalysts, due to the limitation I mentioned earlier,
would take 35 seconds to run a single polling run, so they ran
every 60 to give them a breather...they phased the cats out over
time).  This unusually low polling interval was selected somewhat
arbitrarily; "Because we can."  The router polling run would only take
3 seconds (we had around 300 of them, I seem to recall).  It turned
out to have some advantages; 300 seconds is a long time to wait to see
changes in traffic graphs if you tweak BGP.  Collecting 10 datapoints
to make one billing datapoint meant we would generally not lose
anything if we missed a datapoint or even two.

Because we couldn't count on wide implementation of SNMPv2c among
our monitored devices, we used an initial SNMPv1 query against every
device for the sysObjectID (and a couple other things), and then
assigned one of several "handlers" that worked around quirks and
optimized the process for that particular breed of router or switch.

Both the autodection and polling strategies employed a technique
we found by benchmarking the routers and switches we used; we filled
an SNMP packet with as many variables as possible; so that the reply
packet would approach, but not exceed, 64KB, a UDP fragment.

It turns out that if you compared the time it took 50 SNMP packets
transmitted in parallel to be replied to, to the time it took a
single SNMP packet with 50 variables that had to be fragmented,
the faster approach was the single, fragmented, packet.  We used a
fine-tuned number of variables for each system, again keyed by
sysObjectID.

Sensing the remote system's available UDP buffer (again keying
this off of sysObjectID), we would queue multiple such monster
SNMP packets, guaranteeing that at all times throughout a single
polling run, the remote system had an SNMP packet in its buffer
to reply to, and in in the air in both directions, even with
retransmissions.

More information on the subject of autodetection...

On some of the handlers, we were able to capitalize on iterating a
series of GETs for every ifIndex from 0 up to ifCount-1 (usually on
ifName.*, but sometimes ifDescr.*, and even other times (those damn
Catalysts) on their enterprise-mib ifAlias.*, and then queuing
additional ifCount,ifCount+1,... GETs to the tail upon every reception
of an error (indicating a gap in ifIndex, which happens on hot swaps).
Successful replies indicated the interface existed, an entry was
created, and we'd start querying the actual data we wanted from other
tables.  This tail-inserting strategy meant there was a continuous
flow of valid configuration information from the remote end, something
neither GETNEXT nor GETBULK can supply.

But for example, there were switch implementations which used ifIndex
values that were contrived to survive reboots (a feature desired by
people who used the manually-configured SNMP software of the day),
which meant the ifIndex space was sparsely populated.

There was more than just the one MIB we wanted to monitor as well;
anyone with routers wants to know what their environmental monitoring
is saying...and are the fans running?  We are speaking of course of
the fabulous area of enterprise-specific MIBs.  For most of these,
we were able to enter in (in sourcecode) a set of manual extra
variables to monitor (ex: 7200s had exactly 3 temperature sensors and
precisely two power supplies, they were always at precisely the same
OIDs), keyed again by sysObjectID.  But for example, we couldn't do
this for MAC address accounting features, a feature commonly used to
track bandwidth traded with multiple peers on one interface.

The pain involved in autodetecting the MAC accounting MIB (because it
could change more frequently than once per half hour) successfully
kept me from using it.  That is a challenge that exceeded either my
interest or my ability, I'm not sure which.  I'm sure the ops team is
still using their expect script to fetch MAC accounting data out of
the routers' CLI.  I find that preferrable either way.

So, yes Randy, there are folks who use SNMP routinely in this way,
and there are also folks who refuse to use SNMP in this way.

The lesson is that madness is relative, and you must often do insane
things to avoid even more insane consequences.

-- 
Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
Why settle for the lesser evil?	 https://secure.isc.org/store/t-shirt/
-- 
David W. Hankins	"If you don't do it right the first time,
Software Engineer		     you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins
Attachment:
pgp8Um0DuuzTA.pgp

Description: PGP signature
_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf