[libvirt PATCH 0/4] [RFC] Functional CI - GitLab enablement

Erik Skultety <eskultet@xxxxxxxxxx> · Mon, 31 Jan 2022 19:00:57 +0100

I'd like to propose a PoC of a GitLab-driven functional CI based on GitLab's
custom executor runner feature.

** TL;DR **
- there's some RH owned HW that can be utilized for upstream functional CI
  running workloads on Red Hat-sponsored distros (see the pipeline)
- the PoC utilizes GitLab's custom executor feature [1] which allows literally
  any infrastructure to be plugged into the CI machinery
- the PoC solution (as proposed) is relying on nested virt capability and
  backed by plain libvirt + QEMU/KVM
- the test environment comprises of the libvirt-TCK suite [2] and Avocado test
  framework [3] as the harness engine (as a wrapper and future replacement of
  the existing Perl implementation)
- example pipeline (simplified to only a couple of distros) can be found here [4]
- libvirt debug logs are exposed as artifacts and can be downloaded on job
  failure

** LONG VERSION **

This RFC is driven by GitLab's custom executor feature which can be integrated
with any VMM and hence libvirt+QEMU/KVM as well. Example pipeline can be found
here [4].

Architecture
============

 Platform
 ~~~~~~~~
It doesn't matter what platform is used with this PoC, but FWIW the one tested
with this PoC is a baremetal KVM virtualization host provided and owned by
Red Hat, utilizing libvirt, QEMU, a bit of lcitool [5], and libguestfs.

 Custom executor
 ~~~~~~~~~~~~~~
If you're wondering why this PoC revolves around the custom executor GitLab
runner type [6], some time ago GitLab decided that instead of adding and
supporting different types of virtualization runners, they'd create some kind
of gateway suitable basically for any solution out there and named it 'custom
executor' - it's nothing more than a bunch of shell scripts categorized to
various stages of a pipeline job (which will run either on the host or the
guest depending how you set it up).

 Provisioner
 ~~~~~~~~~~~
Wait, why is this section needed, just take the example scripts from [1]...
If you look at the example Bash scripts in GitLab's docs on custom executor [1]
you'll see that the example scripts are very straight forward and concise but
there are few things that are suboptimal
- creating a new disk image for every single VM instance of the same distro out
  of the virt-builder template
- injecting a bunch of commands directly with virt-builder instead of using
  something more robust like Ansible when there is more than a single VM to be
  created
- static hostname for every single VM instance
- waiting for the either the VM getting an IP or for the SSH connection to be
  available for a quite limited fixed period of time

Because I wanted to address all of the ^above pitfalls, I created a Python
based provisioner relying on libvirt-NSS, transient machines,
backing chain overlay images, cloud-init, etc. instead. You can find the code
base for the provisioner here [7].

 Child pipelines & Multiproject CI pipelines
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
child pipeline = a "downstream" (or subpipeline) of the main pipeline useful
                 when there are multiple stages which should be part of a
                 single stage of the main pipeline

multiproject pipeline = a pipeline that is kicked off by the main pipeline of
                        your project and runs in context of a different project
                        which your main pipeline can then depend on

Technically speaking none of the above 2 GitLab features is needed to really
integrate this PoC to upstream libvirt CI, but it was a nice experiment from 3
perspectives:
    1) seeing that GitLab's own public SaaS deployment has features from all
       tiers available

    2) it made some job descriptions shorter because e.g. we didn't have to
       rebuild libvirt in the VM when it was done in the previous step in a
       container; we don't need to build the Perl bindings (which we need for
       libvirt-TCK) in the VM either, since we already have a project and hence
       a pipeline for that already, we just need to harness the results which
       combined makes the actual VM's purpose more obvious

    3) installing libvirt and libvirt-perl using the RPM artifacts built in
       previous pipeline stages is closer to a real life deployment/environment
       than re-building everything in the VM followed by doing
       'sudo make install'

 Artifacts
 ~~~~~~~~~
Apart from the RPMs being published as artifacts from an earlier stage of the
pipeline as mentioned in the previous section, in order for this to be any
useful to the developers (I know, NOTHING beats a local setup, just be patient
with us/me for another while...), when there's a genuine test failure (no
infrastructure is failproof), the job description as proposed in patch 4/5
exposes libvirt debug logs as artifacts which can be downloaded and inspected.

Test framework
==============
For the SW testing part a combination of libvirt-TCK [2] and Avocado [3] were
used. The reasoning for this combined test environment was discussed in this
thread [8], but essentially we'd like to port most of the Perl-native TCK tests
to Python, as more people are familiar with the language, making maintenance of
them easier as well as making it easier to contribute new tests to the somewhat
ancient test suite. Avocado can definitely help here as it does provide a good
foundation and building stones to achieve both.

Work in progress
================
One of the things that we already know is problematic with the current hybrid
TCK+Avocado setup is that if a test fails, the summary that Avocado produces
doesn't quickly tell us which tests have failed and one has to either scroll
through the GitLab job output or wade through Avocado job logs.
    -> this issue is tracked in upstream Avocado and should be addressed soon:
       https://github.com/avocado-framework/avocado/issues/5200

Another annoying thing in terms of test failure analysis is that TCK in its
current shape cannot save libvirt debug logs on a per-test basis and so a
massive dump is produced for the whole test run. This is very suboptimal, but
this can only be solved with libvirt admin API which can enable/modify/redirect
logs at runtime, however, the perl bindings are currently missings
    -> I'm already working on adding the bindings to libvirt-perl and then we
       need to teach TCK to use that feature

Known issues
============
There are already a couple of known test failures which can be seen in the
example pipeline:
    1) modular daemons occasionally hang in some test scenarios
      (doesn't happen with monolithic)
      https://bugzilla.redhat.com/show_bug.cgi?id=2044379

    2) QEMU tray status detection when a CD media is changed
       -> this one is also intermittent, but no good reproducer data to attach
          to an official QEMU BZ has been harnessed so far

[1] https://docs.gitlab.com/runner/executors/custom.html
[2] https://gitlab.com/libvirt/libvirt-tck/
[3] https://avocado-framework.readthedocs.io/
[4] https://gitlab.com/eskultety/libvirt/-/pipelines/457836923
[5] https://gitlab.com/libvirt/libvirt-ci/
[6] https://docs.gitlab.com/runner/executors/
[7] https://gitlab.com/eskultety/libvirt-gitlab-executor
[8] https://listman.redhat.com/archives/libvir-list/2021-June/msg00836.html

Erik Skultety (4):
  ci: manifest: Allow RPM builds on CentOS Stream 8
  ci: containers: Add CentOS Stream 9 target
  ci: manifest: Publish RPMs as artifacts on CentOS Stream and Fedoras
  gitlab-ci: Introduce a new test 'integration' pipeline stage

 .gitlab-ci-integration.yml               | 116 +++++++++++++++++++++++
 .gitlab-ci.yml                           |  17 +++-
 ci/containers/centos-stream-9.Dockerfile |  87 +++++++++++++++++
 ci/gitlab.yml                            |  33 ++++++-
 ci/manifest.yml                          |  26 ++++-
 5 files changed, 274 insertions(+), 5 deletions(-)
 create mode 100644 .gitlab-ci-integration.yml
 create mode 100644 ci/containers/centos-stream-9.Dockerfile

-- 
2.34.1