The discussion on devel list about ARM and my work last week on reinstalling builders quickly and commonly has raised a number of issues with how we manage our builders and how we should manage them in the future. It is apparent that if we add arm builders they will be lots of physical systems (probably in a very small space) but physical, none-the-less. So we need a sensible way to manage and reinstall these hosts commonly and quickly. Additionally, we need to consider what the introduction of a largish number of arm builders (and other arm infrastructure) would do to our existing puppet setup. Specifically overloading it pretty badly and making it not-very-manageable. I'm making certain assumptions here and I'd like to be clear about what those are: 1. the builders need to be kept pristine 2. that currently our builders are not freshly installed frequently enough. 3. that the builders are relatively static in their configuration and most changes are done with pkg additions 4. that builder setups require at least two manual-ish steps of a koji admin who can disable/enable/register the builder with the kojihub. 5. that the builders are fairly different networking and setup-wise to the rest of our systems. So I am proposing that we consider the following as a general process for maintaining our builders: 1. disable the builder in koji 2. make sure all jobs are finished 3. add installer entries into grub (or run the undefine, reinstall process if the builder is virt-based) 4. reinstall the system 5. monitor for ssh to return 6. connect in and force our post-install configuration: identification, network, mount-point setup, ssl certs/keys for koji, etc 7. reboot 8. re-enable host in koji We would do this with frequency and regularity. Perhaps even having some percentage of our builders doing this at all times. Ie: 1/10th of the boxes reinstalling at any given moment so in a certain time frame*10 all of them are reinstalled. Additionally, this would mean these systems would NOT have a puppet management piece at all. Package updates would still be handled by pushes as we do now, if things were security critical, but barring the need for significant changes we could rely on the boxes simply being refreshed frequently enough that it wouldn't need to be pushed. What do folks think about this idea? It would dramatically reduce the node entries in our puppet config, it would drop the number of hosts connecting to puppet, too. It will mean more systems being reinstalled and more often. It will also require some work to make the steps I mention above be automated. I think I can achieve that without too much difficulty, actually. I think, in general, it will increase our ability to scale up to more and more builders. I'd like input, constructive, please. Thanks, -sv _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure