Re: Top 10 services/servers/etc

Kevin Fenzi <kevin@xxxxxxxxx> · Sat, 5 Mar 2011 13:35:53 -0700

On Fri, 04 Mar 2011 20:31:05 -0500
Gareth Marchant <gareth@xxxxxxxxxxxx> wrote:

> How about devices? I am sure there are routers, switches, gateways,
> firewalls and maybe storage hardware monitored by nagios that are
> high priority/highly critical and worthy of test? 

Well, much of the routers/switches/gateways are not under our control.
They are controlled by whatever facility we have machines in.
Monitoring of gateways is mostly done via monitoring the vpns we use
between sites. 

There is some storage backend stuff in phx2 that should probibly be
monitored. 

> How deeply should testing go or, put another way, how much go-live
> risk can be tolerated? Should a gap analysis of stage environment to
> production be performed prior to making a nagios test plan? I am not
> sure how rigorously structured this upgrade plan should be!

Yeah, not sure either. ;) 

I think monitoring could be improved, but it's hard to do that all at
once. One possible plan would be to spin up a new nocXX in production,
get it so that everything is showing green on it's monitoring before we
retire noc01. The downside is that we might have to give this new
machine/ip access to more things to be able to monitor, and we would be
double monitoring things during the transition. On the plus side we
could check them against each other to make sure we were monitoring
everything we were before and that it was ok. 

Of course some services would have to be migrated all at once. 
(zodbot, dhcp, tftp, meetbot httpd). 

Just a thought. 

kevin
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure