On Mon, Feb 27, 2023 at 8:33 PM Thorsten Glaser <t.glaser@xxxxxxxxx> wrote: > > On Mon, 27 Feb 2023, Nico Kadel-Garcia wrote: > > >> > does any one of you have a best practice on renewing ssh host keys on cloned > >> > machines? > >> > >> Yes: not cloning machines. > > > >Good luck with *that*. Building VM's from media is a far, far too > >lengthy process for production deployment, especially for auto-scaling > >clusters. > > (It’s “VMs”, no genitive apostrophe.) OK, point. > What media? debootstrap + a local mirror = fast. > In fact, much faster than cloning, possibly large, filesystems, > unless you use CoW, which you don’t because then you’re overcommitting. Sure, I was doing that sort of "local build into a chroot cage" stunt in 1999. It's re-inventing the wheel, and using a 3-D printer to spend time making it, when you've already a very broad variety of off-site VM images, and well defined tools for deploying them directly. I suspect that most of us have better things to do with our time than maintaining a local mirror when our friends in every cloud center on the planet have already done the work. > >> There’s too many things to take care of for these. The VM UUID in > […] > > >That's what the "sysprep" procedure is for when generating reference > >VM images, and "cloud-utils" for setting up new VMs from images, at > > What guarantees you “sysprep” and “cloud-utils” find everything that > needs to be changed? What makes your customized, hand-written, internal versions of such tools are better and will work more reliably than a consistently and effectively used open source tool? > (I’m not sure where inode generation numbers are (still) a concern, > on what filesystems, anyway. They only come into play with NFS, > AFAIK, though, so that limits this issue. When they come into play, > however, they’re hard to change without doing a newfs(8)…) They exist for other filesystems. I've not really gone digging into them, they Just Work(tm) for the imaging tools applied to the VM images. > >If people really feel the need for robust random number services, > >they've got other problems. I'd suggest they either apply an init > >script to reset whatever they feel they need on every reboot, or find > > I think you’re downplaying a very real problem here, as an aside. In the last 35 years, I've only seen some care much about the RNG..... twice. And those hosts wound up with physical random number generators on PCI slots, it was years ago. > >The more host-by-host customization, admittedly the more billable > >powers and the more yourself personally into each and every stop. But > >it doesn't scale > > Huh? Scripting that creation from scratch is a job done once that > scales very well. debootstrap is reasonably fast, installation of > additional packages can be fast as well (since it’s a new image, > use eatmydata or the dpkg option I’ve not yet remembered). I've been the guy who had to do it with large deployments, up to about 20,000 hosts of quite varied hardware from different vendors and different specs. I do believe they replaced my tools, after about 20 years. when someone found an effective open source tool. That especially included kernel updates to support the newer platforms. I have stories when someone deployed out-of-date OS images remotely on top of the vendor image we gave hardware vendors for initial deployment. Being able to use a vendor's already existing tools, such as every cloud provider's tools, saves a *lot* of time having to re-invent such wheels. > And, given the system’s all-new, I believe this is even more > reliable than cloning something customised, then trying to > adapt *that* to the new requirements. There are trade-offs. One is that skew of the OS building tools can lead to skew among the images. Another is that what you describe does not scale automatically and horizontally for commercial auto-scaling structures, which are almost always VM image based these days, and wind up with the identical host key or skewed host keys for the same re-allocated IP address problem, it's work to try to resolve both. Much, much simpler, and more stable, to simply ignore known_hosts and spend your time on the management of user public keys, which is generally the far greater risk. > >, and you will eventually be told to stop wasting your > >time if your manager is attentive to how much time you're burning on > >each deployment. > > If I’ve scripted the image creation, it’s no more work than > a cloning approach. Been there, done that, and it keeps needing tweaking. > >> This is even more true as every new machine tends to get just the > >> little bit of difference from the old ones that is easier to make > >> when not cloning (such as different filesystem layout, software). > > > >And *that* is one of the big reasons for virtualization based > >deployments, so people can stop caring about the physical subtleties. > > ?!?!?! > > How does that translate into needing, say, 8 GiB HDD for some VMs but > 32 GiB HDD for some others? Consistently create small images. Expand them to include that available disk space with an init script embedded in the image. Remember when I mentioned 20,000 hosts at a time? It's admittedly similar to the eork needed for deploying images to new hardware. > This has *NOTHING* to do with physical vs virtual platforms. Virtual platforms label the disks and partitions fairly consistently. Convincing /etc/fstab to work for newly deployed hardware can be.... tricky, if the image deployment uses distinct drive labeling. Been there, done that, have scar tissue from the Promise SATA controller drivers that renumbered the /dev/sd* labeled drives in the kernel to pretend that their add-on card had the first labeled drives. Drove me *nuts* unfurling that one, because it depended on which kernel you used. > >predicted, nor was reverse DNS likely to work at all which was its own > >distinct burden for logging *on* those remote servers. > > Maybe invest your time into fixing infrastructure then… The reverse DNS was not my infrastructure to fix. When you host servers remotely, convincing the remote datacenter to do reverse DNS correctly is.... not always an effective use of time. > >> (Fun fact on the side, while doing admin stuff at $dayjob, I even > […] > > >You probably don't work on the same scale I've worked, or had to > > No, not for that. If I had to do it at larger scale I would have > scripted it. I didn’t so it turned out to be cheaper, work-time-wise, > to do part of the steps by hand the few times I needed to do it. > I don’t admin stuff at work for others any more, so that point is > moot. But I did want to share this as an anecdote: when scaling very > small, the “stupid” solution may be better than a clever one. Yeah, one-offs or small tasks can just be faster to use a few lines of shell or even a manual step. I've been dealing with bulky environments where infrastructure as code is vital. Spending the time to ensure individualized hostkeys, when there's a significant chance of IP re-use and conflicting host keys to clean up, is... well it's time better spent elsewhere. It's why tools like "ansible" typicall disable the known_hosts file at run time with just the ssh_config settings I mentioned. They don't have the time to manually validate SSH host key conflicts when deploying new servers. > >It's also a great way to stretch > >our billable hours with very familiar tasks which only you know how to > >do. > > I don’t need to do that. Besides, I’m employed, not freelancing, > so I don’t even have to care about billable hours. > > bye, > //mirabilos Well, good for you. I'm sad to say I've seen people chortling over how ver, very, very clever they were with deliberately hand-tuned setups to assert their complete mastery over their turf, and been brought in a few times to stabilize the mess when they left. It's lead me to a lot of "keep it very, very simple" steps like "don't bother using known_hosts". _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev