Jeffrey/All: I've been corresponding off-list with Elliot with regard to upgrading db1. At his suggestion, I'm going to run my proposed process by everyone for feedback. I went through a production DB server upgrade a while back and this process reflects what I did then and lessons learned during the process. I also had Slony in this mix, which resulted in a few more steps. From what I've read on the wiki pages regarding the current Fedora infrastructure setup, there is only 1 DB server, a SPOF. It may be worth the time to start a separate conversation about some sort of replication or clustered setup, which a beast unto itself. I have a master/slave replication setup with Slony which works quite well. You still have the SPOF with the master but at least you can load balance between the slaves and still have read access going if the master goes down. With all of the research I've done, I've not found anything which is production ready and allows a multi master scenario for Postgres, only master/slave scenario with replication, like Slony. The closest thing I found was PGCluster but it was very flaky, at least on 8.x, the last time I played with it late last year. PGCluster was also very hardware intensive, meaning it takes like 4 machines to run a setup without a SPOF. The following assumes the desire for as minimal as possible downtime and as a result is a little more involved than if more downtime were tolerable. I welcome comments and refinements to the process. Now, onto the upgrade process. *** Make a DB and filesystem backup *** Ensure that all necessary config files are archived so they can be quickly reinstalled via kickstart later 1) Setup all apps to connect to DB using a host alias in /etc/hosts 2) Setup another server as a temporary DB server Note: An option here is to copy data to the temp DB server and do an immediate cut over to the temp server after the data has been copied but you run the risk of data being out of sync between the master and temp server. Danger, Will Robinson! 3) Temporarily disable apps hitting DB 4) pg_dump DBs and template0, users, groups, etc over to temp server 5) Test to ensure DB is up and accepting connections 6) Change the alias entry in /etc/hosts on hosts running apps to the record for the temp DB server 7) Re-enable apps 8) Re-install and configure OS on old DB server via kickstart 9) Temporarily disable apps hitting DB 10) pg_dump DBs and template0, users, groups, etc over to master server 11) Test to ensure master DB is up and accepting connections 12) Change the alias entry in /etc/hosts on servers running apps back to the record for the master DB server There, my 12 step program for a DB upgrade. :-) I had a few more Slony related steps which required shuffling the app servers between Slony slaves and the master but the above it basically the process. I made a few decisions/assumptions during my process. In my case since 99% of DB hits were reads,I still had my Slony slaves accepting read requests. I was able to do this since I maintain both read and write DB handles in the SoftSwitch. Some apps allow a "degraded" mode where data is "read only" just for this sort of thing, I'm not sure if this is the case here or not. During the time I disabled the apps to copy over the DB to the temp server, users just couldn't login to the web portal to change their settings, which are written to the master, for a few minutes. I had to ask, would this really be an issue for 5 min at 3am CST on a Sunday morning? (assuming that the DBs can be copied in the span of 5 min, I have a VLAN'd GigE management network for this sort of thing) The same downtime was experienced when I switched back over to the master DB. Since I'm assuming we're upgrading from Postgres 7.x we'll definitely want to do a pg_dump/pg_restore since there are some inherent differences in the data structures on the disk. In the past I've just stopped postgres, copied over /var/lib/pgsql to the new server and started postgres there, and called it good but you can't do that when upgrading from 7.x to 8.x. Since the data itself has to be migrated from the 7.x format to the 8.x format at some point, there is probably going to have to be some measure of downtime. Otherwise you get into having to compare data in 2 different databases to see if anything changed and manually replicating those changes to the master copy. For me, the 5 min of partial downtime was _much_ cheaper. I look forward to discussion on this. Cheers, -Curt