On Wed, 2020-06-10 at 13:31 +0100, Daniel P. Berrangé wrote: > On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote: > > On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote: > > > Building artifacts in a separate pipeline stage also doesn't have any > > > advantages, and only delays further stages by a couple of minutes. > > > The only job that really makes sense in its own stage is the DCO > > > check, because it's extremely fast (less than 1 minute) and, if that > > > fails, we can avoid kicking off all other jobs. > > > > The advantage of using stages is that it makes it easy to see at a > > glance where the pipeline was failing. Ultimately you'll need to drill down to the actual failure, though, so the only situation in which it would really provide value is if for some reason *all* cross builds failed at once, which is not something that happens frequently enough to optimize for. > > > Reducing the number of stages results in significant speedups: > > > specifically, going from three stages to two stages reduces the > > > overall completion time for a full CI pipeline from ~45 minutes[1] > > > to ~30 minutes[2]. > > > > > > [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 > > > [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 > > > > I don't think this time comparison is showing a genuine difference. > > > > If we look at the original staged pipeline, every single individual > > job took much longer than every individual jobs in the simplified > > pipeline. I think the difference in job times accounts for most > > (possibly all) of the difference in the pipelines time. > > > > If we look at the history of libvirt pipelines: > > > > https://gitlab.com/libvirt/libvirt/pipelines > > > > the vast majority of the time we're completing in 30 minutes or > > less already. That was before introducing FreeBSD builds, which for whatever reason take a significantly longer time: the last couple of jobs both took 50+ minutes. Installing packages is very inefficient, it would seem. Either way, even looking at earlier jobs, it seems clear that we leave compute time on the table: for the last 10 jobs before adding FreeBSD, we have Longest job | Shortest job ------------ ------------- 21:20 | 12:12 16:11 | 09:04 21:31 | 13:40 16:32 | 08:28 14:53 | 08:16 16:01 | 07:59 16:17 | 08:40 15:30 | 08:49 15:12 | 09:11 16:20 | 08:34 which means the pipeline is stalled for at least 5-8 minutes each time. That's time that we could use to run builds, but we just sit idly and wait instead. The difference becomes even bigger with FreeBSD in the mix. Even from a more semantical point of view, pipeline stages exist to implement dependencies between jobs: a good example is our container build jobs, which of course need to happen *before* the build job that uses that container can start. There are no dependencies whatsoever between native builds and cross builds. > > If you want to demonstrate an time improvement from these merged > > stages, then run 20 pipelines over a cople of days and show > > that they're consistently better than what we see already, and > > not just a reflection of the CI infra load at a point in time. I could do that, sure, it just seems like a waste of shared runner CPU time... > Also remember that we're using ccache, so slower builds may just be a > reflection of the ccache having low hit rate - a sequence of repeated > builds of the same branch should identify if that's the case. I've been running builds pretty much non-stop over the past few days, and since the cache is keyed off the job's name there should be no significant skew caused by this. -- Andrea Bolognani / Red Hat / Virtualization