On Fri, 18 Mar 2011 11:04:32 -0400 seth vidal <skvidal@xxxxxxxxxxxxxxxxx> wrote: > Hi folks, > some thoughts have been slowly coalescing in my head about how we're > managing our boxes/services and I have some suggestions I've passed by > various folks but I wanted to check them out with everyone: > > > 1. puppetd sucks..... memory. Right now we have puppetd running on > every box and it wakes up every half hour and runs itself. This is > fine but in the time where it is not doing anything it just eats > memory for no good reason. I'd like to suggest we move to a > cron-driven model instead of puppetd. I'd write a simple cron job > that runs every half hour to run puppetd, if a lock file is not > found. Pretty straightforward, of course. I think this is a fine idea. ;) > 2. monitoring if puppetd has run properly: > two things we want to know about puppet runs: > a. when they last happened per-box > b. if they fell over in a horrible way. > > (a) can be known by looking at the $nodename.yaml file which lives > on the puppetmaster. I've written a script to check if that file is > older than 1 hour and report the nodename if it is. > (b) can be done via the cron job - ie: taking error output from > the puppet run and mailing to people until we fix it! :) Sounds good. There are some few boxes where we don't run puppet, (the sign* boxes, some of the backup boxes?) Options here: 1) if we don't intend to puppet manage them, perhaps we should completely disable them/comment them out for normal operations? I know the sign* machines puppet module is intended to setup everything needed on those machines with a blank db and ready to configure. So, we would only be using this in setting up new instances. Disable the rest of the time. 2) Fix the puppet modules on them so that for normal operations they only do a small number of things... fasClient,etc. I think this is not intended however for security reasons. > 3. sign** boxes. problems here: > a. These boxes are falling out of date, repeatedly, b/c they aren't > in our normal updating path. > b. these boxes don't email out to the same locations as the other > boxes > c. these boxes don't get faspassword updates properly > d. these boxes don't get config changes normally via puppet > > (a) I'd like to suggest that they be put into a normal updating > path and/or we setup a nag mail to tell us about them > (b) obviously, fix their mail configs > (c) fasclient is failing b/c of a missing token b/c, most likely, > of (d) > > I'm open to suggestions on those but it is a bit annoying b/c while > I understand their 'sensitivity' I think our way of treating them is > making the problem WORSE not better. a) I'd agree. nag mail on updates might be the easy path. b) yep c) Perhaps we should just make them non fas accounts there? Like backup? d) we either need to fix the puppet module to not tamper with any db stuff in normal operations, or not use puppet on them except to setup initial config. I know one of the things I was going to look at doing was making a new sign-{bridge|vault} pair with puppet and see what all it did and if it got everything setup, etc. So, short term, I would say we should apply updates, fix mail, setup nag mail for updates, and fix fasclient and leave the puppet issue for later after we look at what all is going on in that module. kevin
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure