On Wed, 21 Mar 2012, Kevin Fenzi wrote:
I'd agree collectd off probibly. Or at least a seperate one if we
needed to monitor them.
I'm not sure what benefit we get from collectd on transient builders,
though.
On our long-running hosts I understand but not on the builders.
Yeah, the only case I can see is so we could see how loaded they are...
and we might have better ways to tell that.
Yeah, we could hopefully have another network thats larger than /24
for the arm builders.
I can imagine various network changes should easily allow us to
allocate larger than a /24 to the internal build network.
Yeah.
I'm sure some of this will be a process of 'oh no, what we have now
doesn't scale, lets fix it'. Of course some of it we can get ready
for up front too.
yay for planning! :)
Overall I like the idea of the automated builder re-install and
think it will get us more ready for things like a large arm
cluster.
Then I will get crackin' on making it work.
Sounds good.
I wanted to come back around to this discussion to close it out- as we
are most of the way complete here:
In the last few weeks I've setup a system that deploys a new builder,
provisions it and gets it ready in a single command.
It's in the builder git repository. This repo is on lockbox but it is only
accessible to sysadmin-main and sysadmin-releng.
I've posted a site-specific sanitized version of the script I'm using
here:
http://fedorapeople.org/cgit/skvidal/public_git/scripts.git/tree/ansible/start-prov-boot.py
and I'll be happy to post the playbooks I'm using to provision these
hosts.
The repo is restricted b/c it contains some certs/ssl keys that we aren't
going to give away to everyone :)
The process for reinstalling a host is incredibly trivial, we built all
the hosts for the latest mass rebuild using that process. It takes a
single command and you walk away.
(other than any enabling of the build in koji).
The next step is to put this process into a cron job so we, ideally, can
reinstall a certain percentage of our builders at any/all times.
We're using ansible for all of the command/control and it has been
remarkably stable for our use case. It does require ssh keys on the hosts
but we have that set via kickstarts now for the builders.
After some discussion we took the step of removing FAS and all fedora
accounts from the builders. We couldn't come up with a compelling reason
to keep these throw-away hosts coupled to FAS since the only folks
connecting to them were sysadmin-main/releng - it was a waste of time to
setup and keep the FAS db on the hosts current. Furthermore, it was an
additional risk that a rogue package could try to snatch up our fas db and
crack the passwords.
If anyone has any questions about how this works or would like any piece
of the infrastructure for doing it (other than the certs/keys :)) please
email to this list and ask.
-sv
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure