On Fri, Jun 24, 2022 at 10:21:50PM -0400, Theodore Ts'o wrote: > On Fri, Jun 24, 2022 at 03:54:44PM -0700, Luis Chamberlain wrote: > > > > Perhaps I am not understanding what you are suggesting with a VM native > > solution. What do you mean by that? A full KVM VM inside the cloud? > > "Cloud native" is the better way to put things. Cloud VM's > are designed to be ephemeral, so the concept of "node bringup" really > doesn't enter into the picture. > > When I run the "build-appliance" command, this creates a test > appliance image. Which is to say, we create a root file system image, > and then "freeze" it into a VM image. So this seems to build an image from a base distro image. Is that right? And it would seem your goal is to store that image then after so it can be re-used. > For kvm-xfstests this is a qcow image which is run in snapshot mode, > which means that if any changes is made to the root file system, those > changes disappear when the VM exits. Sure, so you use one built image once, makes sense. You are optimizing usage for GCE. That makes sense. The goal behind kdevops was to use technology which can *enable* any optimizations in a cloud agnostic way. What APIs become public is up to the cloud provider, and one cloud agnostic way to manage cloud solutions using open source tools is with terraform and so that is used today. If an API is not yet avilable through terraform kdevops could simply use whatever cloud tool for additional hooks. But having the ability to ramp up regardless of cloud provider was extremely important to me from the beginning. Optimizing is certainly possible, always :) Likewise, if you using local virtualized, we can save vagrant images in the vagrant cloud, if we wanted, which would allow pre-built setups saved: https://app.vagrantup.com/boxes/search That could reduce speed for when doing bringup for local KVM / Virtualbox guests. In fact since vagrant images are also just tarballs with qcow2 files, I do wonder if they can be also leveraged for cloud deployments. Or if the inverse is true, if your qcow2 images can be used for vagrant purposes as well. If you're curious: https://github.com/linux-kdevops/kdevops/blob/master/docs/custom-vagrant-boxes.md What approach you use is up to you. From a Linux distribution perspective being able to do reproducible builds was important too, and so that is why a lot of effort was put to ensure how you cook up a final state from an initial distro release was supported. > I can very quickly have over 100 test VM's running in parallel, and as > the tests complete, they are automatically shutdown and destroyed ---- > which means that we don't store state in the VM. Instead the state is > stored in a Google Cloud Storage (Amazon S3) bucket, with e-mail sent > with a summary of results. Using cloud object storage is certainly nice if you can afford it. I think it is valuable, but likewise should be optional. And so with kdevops support is welcomed should someone want to do that. And so what you describe is not impossible with kdevops it is just not done today, but could be enabled. > VM's can get started much more quickly than "make bringup", since > we're not running puppet or ansible to configure each node. You can easily just use pre-built images as well instead of doing the build from a base distro release, just as you could use custom vagrant images for local KVM guests. The usage of ansible to *build* fstests and install can be done once too and that image saved, exported, etc, and then re-used. The kernel config I maintain on kdevops has been tested to work on local KVM virtualization setups, but also all supported cloud providers as well. So I think there is certainly value in learning from the ways you optimizing cloud usage for GCE and generalizing that for *any* cloud provider. The steps to get to *build* an image from a base distro release is glanced over but that alone takes effort and is made pretty well distro agnostic within kdevops too. > In contrast, I can just run "gce-xfstests ls-results" to see all of > the results that has been saved to GCS, and I can fetch a particular > test result to my laptop via a single command: "gce-xfstests > get-results tytso-20220624210238". No need to ssh to a host node, and > then ssh to the kdevops test node, yadda, yadda, yadda --- and if you > run "make destroy" you lose all of the test result history on that node, > right? Actually all the *.bad, *.dmesg as well as final xunit results for all nodes for failed tests is copied over locally to the host which is running kdevops. Xunit files are also merged to represent a final full set of results too. So no not destroyed. If you wanted to keep all files even for non-failed stuff we can add that as a new Kconfig bool. Support for stashing results into object storage sure would be nice, agreed. > See the difference? Yes you have optimized usage of GCE. Good stuff, lots to learn from that effort! Thanks for sharing the details! Luis