On Mon, 2016-07-11 at 04:05 +0000, Kushal Das wrote: > > ## fedmsg-hub on the backends double enqueue > > We had to restart fedmsg-hubs few times in the backends servers. In > between we also found that fedmsg-hub service was happily enqueueing the > jobs twice (for each compose), and then it got fixed automagically, > nothing was changed in our configuration or in code. > > We are still not sure why this happened, but we are trying to dig more > on this. I had some issues with the openQA and check-compose consumers too, after upgrading to F24; after poking it a bit with Ralph we concluded the python-twisted then in stable was causing issues with fedmsg, fedmsg seemed to be doing the right thing but twisted was eating messages and stuff. The twisted then in updates-testing - 16.2.0-2.fc24 - seemed to make things better, and it's now gone stable. So this might have got fixed by that update, if you updated the boxes. > ## fedmsg-hub broke due to a faulty dependency > > Even though we kept our code up and running for weeks, after the > production deployment we found one of the dependency (fedfind, adamw is > the upstream author) was broken with the fedora atomic image names, and > causing our fedmsg-hub instances go crazy. We have informed upstream, > and got a quick hotfix deployment in few hours after finding the issue. > > For the next release we will make sure if keep it running for longer on > our internal hardware with messages from production fedmsg. This > dependency failure was something we should have caught, but could not. So a bit of background here...Pungi/productmd compose IDs look like this: (DISTRONAME)-(RELEASE)-(DATE).(TYPE).(RESPIN) e.g.: Fedora-24-20160711.n.0 , where Fedora is the 'distro name', 24 is the release, 20160711 is the date, 'n' is the type (indicates 'nightly'), and 0 is the respin. fedfind needs to parse all the bits out of the compose ID for various purposes, so I had some code for parsing compose IDs which naturally enough used the '-' separators to split the distro name from the release and the date. This worked fine up till recently, when releng started doing the 'two-week Atomic' composes - composes of the Atomic (and some Cloud) images for the latest stable release (so 24 at present) done nightly - in Pungi 4. Unfortunately, when they did that, they decided to use 'Fedora-Atomic' as the 'distro name' for these composes. Their compose IDs look like this: Fedora-Atomic-24-20160711.0 (they use the 'production' compose type, where the 'type' identifier is omitted). You can probably guess what a 'distro name' with a - in it does to a parser which is trying to split fields on that character :/ I actually knew about this weeks ago, but I thought I knew all the important users of fedfind and it wasn't really causing any fatal consequences for any of them (because none of the others actually need to do anything with those Fedora-Atomic composes, so the fact that they all got completely confused by such composes and refused to do anything with them was just fine), so it wasn't a big priority for me to fix it. I didn't realize you were using this codepath in fedfind for the autocloud test triggers, so sorry about that! I do wish people would be more careful with separators when naming things, though :/ -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net -- test mailing list test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://lists.fedoraproject.org/admin/lists/test@xxxxxxxxxxxxxxxxxxxxxxx