tl;dr; I'd like to ask for one of three actions for Bodhi's staging database: 0) Go back to a "normal" postgres database. 1) Someone (who is not me due to Fedora 27 time constraints ☺) make Bodhi and its staging sync playbook be BDR compliant. 2) Think of other options? Background I've been trying to deploy a Bodhi 2.9.0 beta to staging for a week, starting on Monday the 10th. The staging BDR sync was broken, and due to how busy everything was given the Fedora 26 release it was understandably a "back burner" item. To be clear I'm not criticizing the response time of anyone here and I know it was a crazy week. However, I want to convey the frustration one might feel from being stuck with a broken database for a time with no "self-service" way to get yourself unstuck. Thanks to Patrick that was fixed today, but then I ran into other problems. Patrick helped me solve a lot of those too (thanks!), but there is still a difficult one to resolve: the staging sync script (the one that brings production data over to staging) is a manual SQL script, and is difficult to get right. It's a lot of DROP TABLE and DROP TYPE statements, and it turned out that it was missing some items. When I attempted to expand the set of items to be more complete, it got into a state that was difficult to recover from with an error message that was not very helpful ("cache lookup failed for relation 7418164"). Very likely there is some relationship that needs to be dropped first, but psql isn't giving me very useful information to determine what that relationship is ☹ Before Bodhi was using BDR in stage, the delete/recreate part of the sync script was simple: "dropdb bodhi2 && createdb -O bodhi2 bodhi2". It was reliable, and it was guaranteed to get the DB into the same state as production (which is very good for testing migrations). Now that Bodhi is using BDR, we cannot use dropdb because some database objects need to persist because they need to be created outside of BDR by an admin (I actually can't remember the details of exactly what can't handle this, but I remember needing to alter it when Bodhi was switched to BDR). This is why we have the manual DROP TABLE/TYPE script. And with this script, it's hard to say for sure that the staging db is in the same state as production before running the migrations. This means that running the migrations on staging might not give me the same experience as I would get on production, which makes stage a little less useful than it would be otherwise. Thus, if we get this script working for now it'll also need maintenance to keep it working in the future. Side question: Is there a better way we could sync the production database to staging than these DROP TABLE/TYPE statements followed by importing the production SQL? If so, that might help me a lot. In addition to the above, Bodhi itself doesn't work with BDR[0]. It has a number of tables that don't have primary keys (and those tables don't necessarily have natural primary keys either as most of them are through tables and don't otherwise need PKs), and there are also some warnings that should be studied[1]. Also, Bodhi's code assumes an ACID compliant database and BDR does remove ACID guarantees in some circumstances. Bodhi's code will need to be studied to look for queries where this might matter (it's possible there aren't any - we just need to make sure and that takes time). We had talked about the possibility of using a "distributed transaction sync" (was that what it was called?) to make sure all nodes commit before the client is told "success" on a commit, but upon further reading the docs on that feature I'm not confident that's what the feature does, mostly due to the docs being very thin about it (iirc, there's only a sentence or two written about this). I think we need to do some testing to ensure that is what it means before assuming it means that, which again, takes time. These things can be fixed, but I need to be focusing on Fedora 27 goals right now. Until Bodhi is compliant, Bodhi's staging deployment is less useful to me because I am unable to log in to it when BDR is enabled, which makes it hard for me to test a lot of Bodhi's functionality in staging. And as I detailed above, I also cannot easily sync data from production. (Patrick kindly temporarily turned off BDR for me on staging, so I should be able to log in right now. Thanks!) PLEA I would like to move Bodhi's staging database back to a normal Postgres database until I have some time to make it BDR compliant. Making it BDR compliant is not simply about getting it to have primary keys - we need to make sure it does safe queries and/or research the distributed transaction sync, and we need to make that SQL script that drops the database tables work and drop everything (since I learned today that I was missing quite a few things). I think this last part could take a lot of time, so it's not truly an easy fix. Another option is for someone other than myself to make Bodhi and the staging sync playbook BDR compliant. I personally don't have time to focus on making Bodhi BDR compliant right now, but if someone else has the time and expertise to focus on that I would welcome the help. Of course, I'm also open to other ideas if you have them. I hope my tone was perceived as positive and friendly in this e-mail. I know this is a contentious issue and I'm not trying to stir the pot or ruffle any feathers. I am also not opposed to the pursuit of BDR, and I think I understand the motivation of the systems team for wanting to use it. Please assume positive intent from me if anything I wrote disturbs you. My intention is simply to express a problem I am experiencing and to ask for relief, or for help. Thanks for reading ☺ [0] https://github.com/fedora-infra/bodhi/issues/1318 [1] https://github.com/fedora-infra/bodhi/issues/1618
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx