On Fri, Mar 18, 2011 at 9:04 AM, seth vidal <skvidal@xxxxxxxxxxxxxxxxx> wrote: > Hi folks, > Âsome thoughts have been slowly coalescing in my head about how we're > managing our boxes/services and I have some suggestions I've passed by > various folks but I wanted to check them out with everyone: > > > 1. puppetd sucks..... memory. Right now we have puppetd running on every > box and it wakes up every half hour and runs itself. This is fine but in > the time where it is not doing anything it just eats memory for no good > reason. I'd like to suggest we move to a cron-driven model instead of > puppetd. I'd write a simple cron job that runs every half hour to run > puppetd, if a lock file is not found. Pretty straightforward, of > course. I'd be happy to help get this going. I've set up puppet a few times in this fashion now and it's pretty easy to do. > 2. monitoring if puppetd has run properly: > Â two things we want to know about puppet runs: > Â a. when they last happened per-box > Â b. if they fell over in a horrible way. It might be overkill, but puppet dashboard is pretty nice. It's a web interface, kind of like a nagios for puppet, telling you exactly the things you want to know above. Plus, it has some pretty graphs :) It runs on cron jobs too. I've set it up once about a year ago, pretty nice. I'm sure it's improved some since then. Again, I'd be happy to help set this up. > Â Â(a) can be known by looking at the $nodename.yaml file which lives > on the puppetmaster. I've written a script to check if that file is > older than 1 hour and report the nodename if it is. > Â Â(b) can be done via the cron job - ie: taking error output from the > puppet run and mailing to people until we fix it! :) > > 3. sign** boxes. problems here: > Â a. These boxes are falling out of date, repeatedly, b/c they aren't > in our normal updating path. > Â b. these boxes don't email out to the same locations as the other > boxes > Â c. these boxes don't get faspassword updates properly > Â d. these boxes don't get config changes normally via puppet > > Â (a) I'd like to suggest that they be put into a normal updating path > and/or we setup a nag mail to tell us about them > Â (b) obviously, fix their mail configs > Â (c) fasclient is failing b/c of a missing token b/c, most likely, of > (d) > > ÂI'm open to suggestions on those but it is a bit annoying b/c while I > understand their 'sensitivity' I think our way of treating them is > making the problem WORSE not better. > > -sv > _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure