On Fri, 2011-03-18 at 11:04 -0400, seth vidal wrote: > Hi folks, > some thoughts have been slowly coalescing in my head about how we're > managing our boxes/services and I have some suggestions I've passed by > various folks but I wanted to check them out with everyone: > > > 1. puppetd sucks..... memory. Right now we have puppetd running on every > box and it wakes up every half hour and runs itself. This is fine but in > the time where it is not doing anything it just eats memory for no good > reason. I'd like to suggest we move to a cron-driven model instead of > puppetd. I'd write a simple cron job that runs every half hour to run > puppetd, if a lock file is not found. Pretty straightforward, of > course. this is done. > > 2. monitoring if puppetd has run properly: > two things we want to know about puppet runs: > a. when they last happened per-box > b. if they fell over in a horrible way. > > (a) can be known by looking at the $nodename.yaml file which lives > on the puppetmaster. I've written a script to check if that file is > older than 1 hour and report the nodename if it is. > (b) can be done via the cron job - ie: taking error output from the > puppet run and mailing to people until we fix it! :) I've written this and it can now submit issues via nsca (via func). One problem it appears our puppet node names do not match our nagios host names, A LOT. So we'll need to get some aliases in place so they work. -sv _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure