On 11/01/2017 03:18 PM, Jeremy Cline wrote: > Hey folks, > > The latest version of FMN in production includes a patch[0] that breaks > all the rules that query for package watchers, resulting in this[1] > infrastructure issue. There's an open PR[2] on FMN that fixes the issue > (reviews welcome). To fix this we have two options. > > The first is to backport it to the current version in production (1.5) > which should be trivial since nothing in this area has been touched in > 2.0. We can then update production and carry on. > > The second option is to update production to 2.0 now (I've included [2] > as a patch in the RPM currently in stage). 2.0 includes a re-write of > the back-end components of FMN to use Celery. It's running in stage now. > Things to note about this: > > * The FMN back-end now requires F26 because of celery versions. > > * The FMN front-end is currently still on RHEL7, but I haven't updated > it in stage yet so I don't know if there's any adjustments necessary > for that (the front-end doesn't use celery so the fact that it's old > _shouldn't_ be a problem). > > * Some care will need to be taken to switch over AMQP queue-wise, > especially because the current FMN queues are jammed with unformatable > messages it keeps requeuing (about 25K of them). We could also just > cut our losses and drop these. > > * The scripts that monitor queue length will need to be adjusted since > there are more queues now and existing queues have been renamed. > > One thing to note is that we're going to have to go through all those > things above at some point anyway. FMN also doesn't really have anything > to do with the release process so if it all goes south during the freeze > it shouldn't matter. > > I don't have a preference one way or the other, really. Whatever makes > the admins happy makes me happy. I'm a bit torn on this one. It seems a bit of a rush to push into prod without having tested the frontends and confirmed that one fix for watchers, but on the other hand nothing around release should block on this and it would be nice to get prod on a code base that we have more confidence in and ability to further fix. do we have any way to tell what all those bad 25k messages are? Likely copr rubygems rebuild ones? If thats all they are I am fine with dropping them and starting afresh. Can you make a patch to fix the monitoring scripts and attach it and update the staging frontends and confirm they are ok? With those in hand, I would be +1 just to upgrade and drop the old messages I think. kevin
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx