Re: [libvirt PATCH 00/20] ci: Move GitLab build recipes to a standalone script

Erik Skultety <eskultet@xxxxxxxxxx> · Tue, 7 Feb 2023 10:56:59 +0100

On Mon, Feb 06, 2023 at 05:19:52PM +0100, Erik Skultety wrote:
> On Mon, Feb 06, 2023 at 03:45:12PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Feb 06, 2023 at 02:52:57PM +0100, Erik Skultety wrote:
> > > This is a follow up to:
> > > https://listman.redhat.com/archives/libvir-list/2023-January/237201.html
> > > 
> > > The effort here is to unify the way builds/tests are executed in GitLab CI vs
> > > local container executions and make another step forward in terms of
> > > reproducibility of (specifically) GitLab environments.
> > > 
> > > Even though code to run all but one (coverity) jobs from GitLab via the
> > > build.sh script is added with this series, local behavior remains the same as
> > > before this series. The reason for that is that that will require more patches
> > > ridding of the Makefile which is currently used and instead integrate usage of
> > > lcitool with the ci/helper Python script which is currently the entry point for
> > > local container executions.
> > 
> > snip
> > 
> > >  .gitlab-ci.yml |  56 ++++++++++-------------
> > >  ci/Makefile    |  16 ++++---
> > >  ci/build.sh    | 121 +++++++++++++++++++++++++++++++++++++++++++------
> > >  ci/helper      |  21 ++++++---
> > >  4 files changed, 155 insertions(+), 59 deletions(-)
> > >  mode change 100644 => 100755 ci/build.sh
> > 
> > I'm really super unenthusiastic about this change, and also the similar
> > change to add an ci/integration.sh. Shell scripting is something we
> > worked hard to eliminate from libvirt. It is an awful language
> > to do anything non-trivial with, error handling, lack of data
> > structures, variable handling, portability are all bug generators.
> > 
> > I know the .gitlab-ci.yml  'script' commands are technically shell
> > script, but they are pretty trivial bits and don't have to worry
> > about portability for dash vs bash etc, or complex control logic.
> > The majority of it is just a simple list of commands to invoke,
> > with the occasional conditional.
> > 
> > The build.sh script is by contrast significantly more complex. By
> > the very nature of taking "N" separate gitlab jobs and multiplexing
> > them onto the one shell script, we're now having todo command line
> > arg parsing in shell and de-multiplexing back to commands. The CI
> > logic is now split up across even more sources - the gitlab config,
> > the build.sh script and the meson.build files, which I think is
> > worse for maintaining this too. 
> 
> True, except that even despite that, and I know what I'm talking about since I
> wrote the integration jobs templates, GitLab is super picky; you can't debug
> the syntax problems properly; certain common syntactic sugars don't work and
> have to be replaced by less common counterparts; using YAML for bash formatting
> leads to many table-flipping moments; you have to wait for Nx10 minutes before
> it fails with some ridiculous error you would have caught early locally but you
> couldn't because of the stage and artifact dependencies. So, once you have it
> finally in place it's unarguably nice, but the road towards that isn't really
> rainbow and unicorns either, it's still shell after all. Since I'm mainly
> working on this part of libvirt, I myself would appreciate if I could easily
> just run a single script instead of copy pasting commands one-by-one to a local
> VM to catch stupid errors quickly rather than wait for GitLab to fail.
> 
> > 
> > I appreciate the goal of being able to run CI commands locally, but
> > I'm not feeling this approach is a net win. I'd much rather just
> > having developers invoke the meson command directly, which is not
> > that hard.
> 
> Well, the big picture here is about making running integration tests locally
> less painful than it is now...plus increase the amount of automation involved.
> That's the major driver here - libvirt needs to be built locally in a safe
> environment before integration tests could be run against it without touching
> the host. So, with this script, while I agree with everything you said about
> Bash, was just one step towards getting the automation I mentioned in place. I
> also considered Python (yeah..why), but that would be super akward.
> 
> TL;DR: if we don't do anything about how we currently maintain the build
> recipes (note we're maintaining 2 already), everybody will continue ignoring the
> integration test suite, unless we enable merge requests where the status quo
> would be somewhat (but not completely) eliminated. With integration tests you
> can't ignore the aspect of getting feedback early compared to waiting for
> GitLab CI.
> 
> > 
> > If we do really want to provide something that 100% mirrors the
> > gitlab CI job commands though, I'm even more convinced now that
> > we should be using the .gitlab-ci.yml as the canonical source.
> > 
> > Since I last mentioned that idea, I found out that this should
> > be something we can achieve via the gitlab runner 'exec' command.
> 
> Haven't heard of ^this one, but I wonder how we can get something meaningful
> out of it for the purposes I mentioned above.
> 
> Erik

So, I had a brief look at gihttps://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797tlab-runner exec this morning. Long story short,
it's deprecated [1] and though it was scheduled for complete removal, that plan
was put on hold for now. Nevertheless, it's not getting any features that are
introduced to .gitlab-ci.yml. The gist is that it doesn't understand 'include'
nor 'extends' which we're using heavily across our gitlab configuration
hierarchy, so it's a no-go. It also doesn't support artifacts in any form, so
while it technically should be possible to save RPM builds in a volume (not
sure if the bind mount is cleared on job completion or not) we could not pass
them into a VM running the integration tests the same convenient way as we do
in the CI.

Another thing to consider is that this solution also requires to have the
gitlab-runner binary installed locally, which many in the community may not be
comfortable with, but that's up for a debate, but hey, you can always build it
yourself.

There's a plan to rework GitLab's local CI execution functionality heavily [2]
before the 'exec' command is sunset, but ultimately their plan is have
something working by Q3 FY24, which can be anywhere from March 2024 until
September 2024, that's more than a year ahead.

In your reply you mentioned the unappealing complexity of the script
potentially leading to bugs. At the same time though one currently can't consume
.gitlab-ci.yml recipes to run a local integration test in a VM.
So, how about I get rid of the multiplexing and CLI parsing by placing each job
recipe in a standalone script or even going one step further and extract the
commonalities to a separate file that would be source from within each job
script? Would you consider that meeting each other halfway?
As for POSIX compliance, I guess this would be a soft-requirement based on
whether shellcheck was run during review, does gitlab do something better? IIUC
there's no pre-check and you'll only know after a job was executed that there
was an incompatible statement.

[1] https://docs.gitlab.com/runner/commands/#gitlab-runner-exec-deprecated
[2] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797

Regards,
Erik