Re: changing a few things in our host mgmt tools

seth vidal <skvidal@xxxxxxxxxxxxxxxxx> · Wed, 23 Mar 2011 14:49:36 -0400

On Fri, 2011-03-18 at 11:04 -0400, seth vidal wrote:
> Hi folks,
>  some thoughts have been slowly coalescing in my head about how we're
> managing our boxes/services and I have some suggestions I've passed by
> various folks but I wanted to check them out with everyone:
> 
> 
> 1. puppetd sucks..... memory. Right now we have puppetd running on every
> box and it wakes up every half hour and runs itself. This is fine but in
> the time where it is not doing anything it just eats memory for no good
> reason. I'd like to suggest we move to a cron-driven model instead of
> puppetd. I'd write a simple cron job that runs every half hour to run
> puppetd, if a lock file is not found. Pretty straightforward, of
> course. 

this is done.

> 
> 2. monitoring if puppetd has run properly:
>    two things we want to know about puppet runs:
>    a. when they last happened per-box
>    b. if they fell over in a horrible way.
> 
>     (a) can be known by looking at the $nodename.yaml file which lives
> on the puppetmaster. I've written a script to check if that file is
> older than 1 hour and report the nodename if it is.
>     (b) can be done via the cron job - ie: taking error output from the
> puppet run and mailing to people until we fix it! :)

I've written this and it can now submit issues via nsca (via func). One
problem it appears our puppet node names do not match our nagios host
names, A LOT. So we'll need to get some aliases in place so they work.

-sv

_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure