I upgraded a cluster from Ubuntu 16.04 to Ubuntu 18.04 this week and since this wasn't as smooth as I hoped I'm posting my experiences here hoping that it will help others. First: No production systems were harmed in the making of this report. We have a test cluster, and it really keeps adrenaline levels down. I might have gotten a tad nervous if this had been in production. Start scenario: * 2 Nodes (we'll call them A and B) running * Ubuntu 16.04 * Patroni 1.4.3 (3rd party) * etcd 2.2.5 (from Ubuntu) * PostgreSQL 10.8 (from pgdg) * 1 Node (E) running * Ubuntu 16.04 * etcd 2.2.5 (from Ubuntu) So, A and B are the database nodes and E is just there to provide the quorum for the etcd cluster. (And obviously, Patroni is configured to use etcd). Goal: Upgrade to Ubuntu 18.04, leave everything else the same (as far as possible). At the start of the upgrade, A was the master, so I started with B. do-release-upgrade successfully and without worrying warnings upgraded to Ubuntu 18.04. But after the machine rebooted, etcd wouldn't start up again. What went wrong? Ubuntu 16.04 came with etcd 2.2.5, Ubuntu 18.04 includes etcd 3.2.17. But you can't upgrade directly from 2.2.x to 3.2.x. You have to upgrade to 2.3.x first, then to 3.0.x and finally to 3.2.x. And you have to do each step for the whole cluster before proceeding to the next. Since there are no Ubuntu packages for etcd 2.3 and 3.0 I fetched the binary releases for etcd-v2.3.8 and etcd-v3.0.17 from github. The executables are statically linked, so you can just copy them into /usr/bin without worrying about dependencies. So I restarted B with etcd 2.3 and it joined the cluster (of course it wasn't quite so straightforward - in exploring different possibilities I had corrupted /var/lib/etcd, so I had to remove the node, clean out the data, and add the node again). Same for the other two nodes (the docs say you should wait for 120 seconds after restarting each node and that really seems to be necessary; if you are impatient you may just wind up doing it again), and then the whole cluster was on protocol version 2.3 (this can be tested with "curl http://localhost:2379/version", and again, you really want to test that before proceding to the next step). Then I did the same dance for etcd 3.0. And finally I reinstalled etcd-server and etc-client from the Ubuntu repo. So now B was on 3.2 and the other two nodes still on 3.0, a compatible combination. Next problem: Patroni. The 3rd party Patroni package used Python 2.7 and there was a problem in some python library. Since Ubuntu now also includes a patroni package (although for 1.4.2, a bit older than the one we had) I didn't investigate that further and just installed the Ubuntu package. I had to rename the config file and install two additional packages (python3-etcd python3-etcd3gw) for etcd support, but that was kind of obvious. So now we had a working cluster again with one machine upgraded to Ubuntu 18. Yay! \o/ Next node is E, which is only running etcd. Since it already has 3.0, the upgrade to 3.2 is smooth. Finally A: Swith the master over to B, run do-release-upgrade, after the reboot, reinstall patroni (+depencies, +rename config). And ... everything works. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature