Update about Autocloud deployment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Below are the notes about Autocloud update happened last week. There
were few things which went bad, few points worked well. I am trying to
list them here, and how do we plan to make sure that the bad things
never happen again. I am ccing Patrick in this mail as he was the
contact point from Fedora infrastructure (read: he did all the work).

## deployment using ansible to the newly installed systems

We only had to change yum to dnf, and also had to remove the old hotfix
part from the roles. The current playbooks, roles seems to be stable,
and can help us in the future too.

## fedmsg-hub on the backends double enqueue

We had to restart fedmsg-hubs few times in the backends servers. In
between we also found that fedmsg-hub service was happily enqueueing the
jobs twice (for each compose), and then it got fixed automagically,
nothing was changed in our configuration or in code.

We are still not sure why this happened, but we are trying to dig more
on this.

## fedmsg-hub broke due to a faulty dependency

Even though we kept our code up and running for weeks, after the
production deployment we found one of the dependency (fedfind, adamw is
the upstream author) was broken with the fedora atomic image names, and
causing our fedmsg-hub instances go crazy. We have informed upstream,
and got a quick hotfix deployment in few hours after finding the issue.

For the next release we will make sure if keep it running for longer on
our internal hardware with messages from production fedmsg. This
dependency failure was something we should have caught, but could not.

## missing fedmsg(s) from autocloud on completing the testing of a compose

Now this was a known part of the whole development+release cycle. Sayan had
submitted the patch [1], but there was some slight miscommunication. In my part
i missed to track the state of this dependency. 

For the next release, i will make a release/deployment checklist for
autocloud, and get it validated from everyone involved. Most probably we
will add too many minor details to this checklist, but that will help us
to keep things in track about any future deployment. sayan is currently
working to get that particular change in production so that we can send
out fedmsg(s) as required by adam.

## missing package dependency causing missing bridge on libvirt backend

We also found that a missing link in the dependency chain caused a
missing virtual bridge in the libvirt backend. Patrick helped to find
that adding libvirt as dependency for that particular box will fix the
issue in future.

We should test on clean installations while developing next time to make
sure this is not repeated. Plus we should think about getting better on
the stage environment.

## the new webfrontend is better

Sayan did a good job in making the new webfrontend. We can now point out
to the exact failures [2].

In future we should try to get more input about the features of the
webfrontend. Even though the whole service is made for automation, but
this frontend helps us to find, and point to the right issues found in
the tests.

## autocloud+tunir did what they are supposed to do

After fixing the hiccups, the autocloud service is doing what it
supposed to do, testing the images. We will push our effort in having
better test coverage in the coming months to take the advantage from
this new deployment.


Please comment/suggest whatever you think about the work. This will help
us to improve in the future releases.

[1]
https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure/pull/386
[2] https://apps.fedoraproject.org/autocloud/jobs/66/output#290

Kushal
--
test mailing list
test@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe:
https://lists.fedoraproject.org/admin/lists/test@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Desktop]     [Fedora SELinux]     [Photo Sharing]     [Yosemite Forum]     [KDE Users]

  Powered by Linux