Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Sat, 25 Jun 2022 11:49:54 -0700

On Fri, Jun 24, 2022 at 10:21:50PM -0400, Theodore Ts'o wrote:
> On Fri, Jun 24, 2022 at 03:54:44PM -0700, Luis Chamberlain wrote:
> > 
> > Perhaps I am not understanding what you are suggesting with a VM native
> > solution. What do you mean by that? A full KVM VM inside the cloud?
> 
> "Cloud native" is the better way to put things.  Cloud VM's
> are designed to be ephemeral, so the concept of "node bringup" really
> doesn't enter into the picture.
> 
> When I run the "build-appliance" command, this creates a test
> appliance image.  Which is to say, we create a root file system image,
> and then "freeze" it into a VM image.

So this seems to build an image from a base distro image. Is that right?
And it would seem your goal is to store that image then after so it can
be re-used.

> For kvm-xfstests this is a qcow image which is run in snapshot mode,
> which means that if any changes is made to the root file system, those
> changes disappear when the VM exits.

Sure, so you use one built image once, makes sense.

You are optimizing usage for GCE. That makes sense. The goal behind
kdevops was to use technology which can *enable* any optimizations in
a cloud agnostic way. What APIs become public is up to the cloud
provider, and one cloud agnostic way to manage cloud solutions using
open source tools is with terraform and so that is used today. If an API
is not yet avilable through terraform kdevops could simply use whatever
cloud tool for additional hooks. But having the ability to ramp up
regardless of cloud provider was extremely important to me from the
beginning.

Optimizing is certainly possible, always :)

Likewise, if you using local virtualized, we can save vagrant images
in the vagrant cloud, if we wanted, which would allow pre-built setups
saved:

https://app.vagrantup.com/boxes/search

That could reduce speed for when doing bringup for local KVM /
Virtualbox guests.

In fact since vagrant images are also just tarballs with qcow2 files,
I do wonder if they can be also leveraged for cloud deployments. Or if
the inverse is true, if your qcow2 images can be used for vagrant
purposes as well. If you're curious:

https://github.com/linux-kdevops/kdevops/blob/master/docs/custom-vagrant-boxes.md

What approach you use is up to you. From a Linux distribution perspective
being able to do reproducible builds was important too, and so that is
why a lot of effort was put to ensure how you cook up a final state from
an initial distro release was supported.

> I can very quickly have over 100 test VM's running in parallel, and as
> the tests complete, they are automatically shutdown and destroyed ----
> which means that we don't store state in the VM.  Instead the state is
> stored in a Google Cloud Storage (Amazon S3) bucket, with e-mail sent
> with a summary of results.

Using cloud object storage is certainly nice if you can afford it. I
think it is valuable, but likewise should be optional. And so with
kdevops support is welcomed should someone want to do that. And so
what you describe is not impossible with kdevops it is just not done
today, but could be enabled.

> VM's can get started much more quickly than "make bringup", since
> we're not running puppet or ansible to configure each node.

You can easily just use pre-built images as well instead of doing
the build from a base distro release, just as you could use custom
vagrant images for local KVM guests.

The usage of ansible to *build* fstests and install can be done once
too and that image saved, exported, etc, and then re-used. The kernel
config I maintain on kdevops has been tested to work on local KVM
virtualization setups, but also all supported cloud providers as well.

So I think there is certainly value in learning from the ways you 
optimizing cloud usage for GCE and generalizing that for *any* cloud
provider.

The steps to get to *build* an image from a base distro release is
glanced over but that alone takes effort and is made pretty well
distro agnostic within kdevops too.

> In contrast, I can just run "gce-xfstests ls-results" to see all of
> the results that has been saved to GCS, and I can fetch a particular
> test result to my laptop via a single command: "gce-xfstests
> get-results tytso-20220624210238".  No need to ssh to a host node, and
> then ssh to the kdevops test node, yadda, yadda, yadda --- and if you
> run "make destroy" you lose all of the test result history on that node,
> right?

Actually all the *.bad, *.dmesg as well as final xunit results for all
nodes for failed tests is copied over locally to the host which is
running kdevops. Xunit files are also merged to represent a final full set
of results too. So no not destroyed. If you wanted to keep all files even
for non-failed stuff we can add that as a new Kconfig bool.

Support for stashing results into object storage sure would be nice, agreed.

> See the difference?

Yes you have optimized usage of GCE. Good stuff, lots to learn from that effort!
Thanks for sharing the details!

  Luis