On Fri, Mar 18, 2011 at 11:04:32AM -0400, seth vidal wrote: > Hi folks, > some thoughts have been slowly coalescing in my head about how we're > managing our boxes/services and I have some suggestions I've passed by > various folks but I wanted to check them out with everyone: > > > 1. puppetd sucks..... memory. Right now we have puppetd running on every > box and it wakes up every half hour and runs itself. This is fine but in > the time where it is not doing anything it just eats memory for no good > reason. I'd like to suggest we move to a cron-driven model instead of > puppetd. I'd write a simple cron job that runs every half hour to run > puppetd, if a lock file is not found. Pretty straightforward, of > course. > +1 Might need to update kickstarts and/or the SOP pages: http://fedoraproject.org/wiki/Kickstart_Infrastructure_SOP http://fedoraproject.org/wiki/Puppet_Infrastructure_SOP > 2. monitoring if puppetd has run properly: > two things we want to know about puppet runs: > a. when they last happened per-box > b. if they fell over in a horrible way. > > (a) can be known by looking at the $nodename.yaml file which lives > on the puppetmaster. I've written a script to check if that file is > older than 1 hour and report the nodename if it is. > (b) can be done via the cron job - ie: taking error output from the > puppet run and mailing to people until we fix it! :) > +1 > 3. sign** boxes. problems here: > a. These boxes are falling out of date, repeatedly, b/c they aren't > in our normal updating path. > b. these boxes don't email out to the same locations as the other > boxes > c. these boxes don't get faspassword updates properly > d. these boxes don't get config changes normally via puppet > > (a) I'd like to suggest that they be put into a normal updating path > and/or we setup a nag mail to tell us about them > (b) obviously, fix their mail configs > (c) fasclient is failing b/c of a missing token b/c, most likely, of > (d) > > I'm open to suggestions on those but it is a bit annoying b/c while I > understand their 'sensitivity' I think our way of treating them is > making the problem WORSE not better. > I agree with your assessment. I guess we need to tell releng our concerns and figure out what needs to be done For a: perhaps have releng okay us/a specific subset of sysadmins to run updates along with all the other updates. -Toshio
Attachment:
pgpbzTYrGsBb7.pgp
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure