Hey folks,
I thought I'd make a summary of where I'm at. Here are the issues I found and what I did about it:
- We ran into an Ansible issue that the PR https://github.com/ansible/ansible/pull/50381 fixes. I've asked pingou to patch batcave since it's basically a one-liner that will keep working with the older prod version.
- When starting a RabbitMQ cluster from scratch, there is a race condition that is documented here: https://www.rabbitmq.com/cluster-formation.html#initial-formation-race-condition
On nodes 02 and 03, I've just destroyed the database and let it auto-detect the cluster again
# systemctl stop rabbitmq-server && rm -rf /var/lib/rabbitmq/mnesia/ && systemctl start rabbitmq-server
It worked fine. I checked with "rabbitmqctl list_users" that all nodes had the same users declared.
- I've also fixed a couple things in the playbooks that assumed the cluster to be up and setup already.
- I've rebuilt collectd-rabbitmq for EPEL8 but we currently only install it on production apparently (not sure why, I think it could be useful in staging.
- The nagios-plugins-rabbitmq RPM still fails to install because of a dependency bug in perl-Monitoring-Plugin, I've opened a ticket about it:
Now, we need to recreate the queues, users and bindings, and I don't have the permissions to run all the playbooks. If someone could run the master playbook limited on staging and on the rabbitmq_cluster tag, I think it should recreate all users and queues and we should be all set.
I'm around and on IRC if you need me.
Aurélien
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx