On Fri, 04 Mar 2011 20:31:05 -0500 Gareth Marchant <gareth@xxxxxxxxxxxx> wrote: > How about devices? I am sure there are routers, switches, gateways, > firewalls and maybe storage hardware monitored by nagios that are > high priority/highly critical and worthy of test? Well, much of the routers/switches/gateways are not under our control. They are controlled by whatever facility we have machines in. Monitoring of gateways is mostly done via monitoring the vpns we use between sites. There is some storage backend stuff in phx2 that should probibly be monitored. > How deeply should testing go or, put another way, how much go-live > risk can be tolerated? Should a gap analysis of stage environment to > production be performed prior to making a nagios test plan? I am not > sure how rigorously structured this upgrade plan should be! Yeah, not sure either. ;) I think monitoring could be improved, but it's hard to do that all at once. One possible plan would be to spin up a new nocXX in production, get it so that everything is showing green on it's monitoring before we retire noc01. The downside is that we might have to give this new machine/ip access to more things to be able to monitor, and we would be double monitoring things during the transition. On the plus side we could check them against each other to make sure we were monitoring everything we were before and that it was ok. Of course some services would have to be migrated all at once. (zodbot, dhcp, tftp, meetbot httpd). Just a thought. kevin
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure