I'd like to propose a PoC of a GitLab-driven functional CI based on GitLab's custom executor runner feature. ** TL;DR ** - there's some RH owned HW that can be utilized for upstream functional CI running workloads on Red Hat-sponsored distros (see the pipeline) - the PoC utilizes GitLab's custom executor feature [1] which allows literally any infrastructure to be plugged into the CI machinery - the PoC solution (as proposed) is relying on nested virt capability and backed by plain libvirt + QEMU/KVM - the test environment comprises of the libvirt-TCK suite [2] and Avocado test framework [3] as the harness engine (as a wrapper and future replacement of the existing Perl implementation) - example pipeline (simplified to only a couple of distros) can be found here [4] - libvirt debug logs are exposed as artifacts and can be downloaded on job failure ** LONG VERSION ** This RFC is driven by GitLab's custom executor feature which can be integrated with any VMM and hence libvirt+QEMU/KVM as well. Example pipeline can be found here [4]. Architecture ============ Platform ~~~~~~~~ It doesn't matter what platform is used with this PoC, but FWIW the one tested with this PoC is a baremetal KVM virtualization host provided and owned by Red Hat, utilizing libvirt, QEMU, a bit of lcitool [5], and libguestfs. Custom executor ~~~~~~~~~~~~~~ If you're wondering why this PoC revolves around the custom executor GitLab runner type [6], some time ago GitLab decided that instead of adding and supporting different types of virtualization runners, they'd create some kind of gateway suitable basically for any solution out there and named it 'custom executor' - it's nothing more than a bunch of shell scripts categorized to various stages of a pipeline job (which will run either on the host or the guest depending how you set it up). Provisioner ~~~~~~~~~~~ Wait, why is this section needed, just take the example scripts from [1]... If you look at the example Bash scripts in GitLab's docs on custom executor [1] you'll see that the example scripts are very straight forward and concise but there are few things that are suboptimal - creating a new disk image for every single VM instance of the same distro out of the virt-builder template - injecting a bunch of commands directly with virt-builder instead of using something more robust like Ansible when there is more than a single VM to be created - static hostname for every single VM instance - waiting for the either the VM getting an IP or for the SSH connection to be available for a quite limited fixed period of time Because I wanted to address all of the ^above pitfalls, I created a Python based provisioner relying on libvirt-NSS, transient machines, backing chain overlay images, cloud-init, etc. instead. You can find the code base for the provisioner here [7]. Child pipelines & Multiproject CI pipelines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ child pipeline = a "downstream" (or subpipeline) of the main pipeline useful when there are multiple stages which should be part of a single stage of the main pipeline multiproject pipeline = a pipeline that is kicked off by the main pipeline of your project and runs in context of a different project which your main pipeline can then depend on Technically speaking none of the above 2 GitLab features is needed to really integrate this PoC to upstream libvirt CI, but it was a nice experiment from 3 perspectives: 1) seeing that GitLab's own public SaaS deployment has features from all tiers available 2) it made some job descriptions shorter because e.g. we didn't have to rebuild libvirt in the VM when it was done in the previous step in a container; we don't need to build the Perl bindings (which we need for libvirt-TCK) in the VM either, since we already have a project and hence a pipeline for that already, we just need to harness the results which combined makes the actual VM's purpose more obvious 3) installing libvirt and libvirt-perl using the RPM artifacts built in previous pipeline stages is closer to a real life deployment/environment than re-building everything in the VM followed by doing 'sudo make install' Artifacts ~~~~~~~~~ Apart from the RPMs being published as artifacts from an earlier stage of the pipeline as mentioned in the previous section, in order for this to be any useful to the developers (I know, NOTHING beats a local setup, just be patient with us/me for another while...), when there's a genuine test failure (no infrastructure is failproof), the job description as proposed in patch 4/5 exposes libvirt debug logs as artifacts which can be downloaded and inspected. Test framework ============== For the SW testing part a combination of libvirt-TCK [2] and Avocado [3] were used. The reasoning for this combined test environment was discussed in this thread [8], but essentially we'd like to port most of the Perl-native TCK tests to Python, as more people are familiar with the language, making maintenance of them easier as well as making it easier to contribute new tests to the somewhat ancient test suite. Avocado can definitely help here as it does provide a good foundation and building stones to achieve both. Work in progress ================ One of the things that we already know is problematic with the current hybrid TCK+Avocado setup is that if a test fails, the summary that Avocado produces doesn't quickly tell us which tests have failed and one has to either scroll through the GitLab job output or wade through Avocado job logs. -> this issue is tracked in upstream Avocado and should be addressed soon: https://github.com/avocado-framework/avocado/issues/5200 Another annoying thing in terms of test failure analysis is that TCK in its current shape cannot save libvirt debug logs on a per-test basis and so a massive dump is produced for the whole test run. This is very suboptimal, but this can only be solved with libvirt admin API which can enable/modify/redirect logs at runtime, however, the perl bindings are currently missings -> I'm already working on adding the bindings to libvirt-perl and then we need to teach TCK to use that feature Known issues ============ There are already a couple of known test failures which can be seen in the example pipeline: 1) modular daemons occasionally hang in some test scenarios (doesn't happen with monolithic) https://bugzilla.redhat.com/show_bug.cgi?id=2044379 2) QEMU tray status detection when a CD media is changed -> this one is also intermittent, but no good reproducer data to attach to an official QEMU BZ has been harnessed so far [1] https://docs.gitlab.com/runner/executors/custom.html [2] https://gitlab.com/libvirt/libvirt-tck/ [3] https://avocado-framework.readthedocs.io/ [4] https://gitlab.com/eskultety/libvirt/-/pipelines/457836923 [5] https://gitlab.com/libvirt/libvirt-ci/ [6] https://docs.gitlab.com/runner/executors/ [7] https://gitlab.com/eskultety/libvirt-gitlab-executor [8] https://listman.redhat.com/archives/libvir-list/2021-June/msg00836.html Erik Skultety (4): ci: manifest: Allow RPM builds on CentOS Stream 8 ci: containers: Add CentOS Stream 9 target ci: manifest: Publish RPMs as artifacts on CentOS Stream and Fedoras gitlab-ci: Introduce a new test 'integration' pipeline stage .gitlab-ci-integration.yml | 116 +++++++++++++++++++++++ .gitlab-ci.yml | 17 +++- ci/containers/centos-stream-9.Dockerfile | 87 +++++++++++++++++ ci/gitlab.yml | 33 ++++++- ci/manifest.yml | 26 ++++- 5 files changed, 274 insertions(+), 5 deletions(-) create mode 100644 .gitlab-ci-integration.yml create mode 100644 ci/containers/centos-stream-9.Dockerfile -- 2.34.1