On 11/07/16, Adam Williamson wrote: > On Mon, 2016-07-11 at 04:05 +0000, Kushal Das wrote: > > > > ## fedmsg-hub on the backends double enqueue > > > > We had to restart fedmsg-hubs few times in the backends servers. In > > between we also found that fedmsg-hub service was happily enqueueing the > > jobs twice (for each compose), and then it got fixed automagically, > > nothing was changed in our configuration or in code. > > > > We are still not sure why this happened, but we are trying to dig more > > on this. > > I had some issues with the openQA and check-compose consumers too, > after upgrading to F24; after poking it a bit with Ralph we concluded > the python-twisted then in stable was causing issues with fedmsg, > fedmsg seemed to be doing the right thing but twisted was eating > messages and stuff. The twisted then in updates-testing - 16.2.0-2.fc24 > - seemed to make things better, and it's now gone stable. So this might > have got fixed by that update, if you updated the boxes. > > > ## fedmsg-hub broke due to a faulty dependency > > > > Even though we kept our code up and running for weeks, after the > > production deployment we found one of the dependency (fedfind, adamw is > > the upstream author) was broken with the fedora atomic image names, and > > causing our fedmsg-hub instances go crazy. We have informed upstream, > > and got a quick hotfix deployment in few hours after finding the issue. > > > > For the next release we will make sure if keep it running for longer on > > our internal hardware with messages from production fedmsg. This > > dependency failure was something we should have caught, but could not. > > So a bit of background here...Pungi/productmd compose IDs look like this: > > (DISTRONAME)-(RELEASE)-(DATE).(TYPE).(RESPIN) > > e.g.: Fedora-24-20160711.n.0 , where Fedora is the 'distro name', 24 is > the release, 20160711 is the date, 'n' is the type (indicates > 'nightly'), and 0 is the respin. > > fedfind needs to parse all the bits out of the compose ID for various > purposes, so I had some code for parsing compose IDs which naturally > enough used the '-' separators to split the distro name from the > release and the date. This worked fine up till recently, when releng > started doing the 'two-week Atomic' composes - composes of the Atomic > (and some Cloud) images for the latest stable release (so 24 at > present) done nightly - in Pungi 4. > > Unfortunately, when they did that, they decided to use 'Fedora-Atomic' > as the 'distro name' for these composes. Their compose IDs look like > this: Fedora-Atomic-24-20160711.0 (they use the 'production' compose > type, where the 'type' identifier is omitted). > > You can probably guess what a 'distro name' with a - in it does to a > parser which is trying to split fields on that character :/ > > I actually knew about this weeks ago, but I thought I knew all the > important users of fedfind and it wasn't really causing any fatal > consequences for any of them (because none of the others actually need > to do anything with those Fedora-Atomic composes, so the fact that they > all got completely confused by such composes and refused to do anything > with them was just fine), so it wasn't a big priority for me to fix it. > I didn't realize you were using this codepath in fedfind for the > autocloud test triggers, so sorry about that! > You helped us to have the hotfix ready, and also made a new release of the upstream package. That was an amazing help, thank you once again for that :) Kushal -- Fedora Cloud Engineer CPython Core Developer https://kushaldas.in https://dgplug.org -- test mailing list test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://lists.fedoraproject.org/admin/lists/test@xxxxxxxxxxxxxxxxxxxxxxx