Re: Update about Autocloud deployment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2016-07-11 at 04:05 +0000, Kushal Das wrote:
> 
> ## fedmsg-hub on the backends double enqueue
> 
> We had to restart fedmsg-hubs few times in the backends servers. In
> between we also found that fedmsg-hub service was happily enqueueing the
> jobs twice (for each compose), and then it got fixed automagically,
> nothing was changed in our configuration or in code.
> 
> We are still not sure why this happened, but we are trying to dig more
> on this.

I had some issues with the openQA and check-compose consumers too,
after upgrading to F24; after poking it a bit with Ralph we concluded
the python-twisted then in stable was causing issues with fedmsg,
fedmsg seemed to be doing the right thing but twisted was eating
messages and stuff. The twisted then in updates-testing - 16.2.0-2.fc24 
- seemed to make things better, and it's now gone stable. So this might
have got fixed by that update, if you updated the boxes.

> ## fedmsg-hub broke due to a faulty dependency
> 
> Even though we kept our code up and running for weeks, after the
> production deployment we found one of the dependency (fedfind, adamw is
> the upstream author) was broken with the fedora atomic image names, and
> causing our fedmsg-hub instances go crazy. We have informed upstream,
> and got a quick hotfix deployment in few hours after finding the issue.
> 
> For the next release we will make sure if keep it running for longer on
> our internal hardware with messages from production fedmsg. This
> dependency failure was something we should have caught, but could not.

So a bit of background here...Pungi/productmd compose IDs look like this:

(DISTRONAME)-(RELEASE)-(DATE).(TYPE).(RESPIN)

e.g.: Fedora-24-20160711.n.0 , where Fedora is the 'distro name', 24 is
the release, 20160711 is the date, 'n' is the type (indicates
'nightly'), and 0 is the respin.

fedfind needs to parse all the bits out of the compose ID for various
purposes, so I had some code for parsing compose IDs which naturally
enough used the '-' separators to split the distro name from the
release and the date. This worked fine up till recently, when releng
started doing the 'two-week Atomic' composes - composes of the Atomic
(and some Cloud) images for the latest stable release (so 24 at
present) done nightly - in Pungi 4.

Unfortunately, when they did that, they decided to use 'Fedora-Atomic'
as the 'distro name' for these composes. Their compose IDs look like
this: Fedora-Atomic-24-20160711.0 (they use the 'production' compose
type, where the 'type' identifier is omitted).

You can probably guess what a 'distro name' with a - in it does to a
parser which is trying to split fields on that character :/

I actually knew about this weeks ago, but I thought I knew all the
important users of fedfind and it wasn't really causing any fatal
consequences for any of them (because none of the others actually need
to do anything with those Fedora-Atomic composes, so the fact that they
all got completely confused by such composes and refused to do anything
with them was just fine), so it wasn't a big priority for me to fix it.
I didn't realize you were using this codepath in fedfind for the
autocloud test triggers, so sorry about that!

I do wish people would be more careful with separators when naming
things, though :/
-- 

Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
--
test mailing list
test@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe:
https://lists.fedoraproject.org/admin/lists/test@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Desktop]     [Fedora SELinux]     [Photo Sharing]     [Yosemite Forum]     [KDE Users]

  Powered by Linux