We just got bitten by something we didn't foresee when completely uninstalling a slony replication cluster from the master and slave...
MAKE SURE YOU STOP YOUR APPLICATION RUNNING AGAINST YOUR MASTER DATABASE WHEN REMOVING THE WHOLE SLONY CLUSTER, or at least re-cycle all your open connections after the event!
The connections appear to "remember" or refer to objects which are removed by the uninstall node script. And you get lots of errors as a result (and possible data loss??)...
Question: Why do our database objects still refer to removed slony objects after they are removed?
John Sidney-Woollett
More info...
Our system is a web application which runs against a postgres 7.4.6 database. Tomcat is restarted at 5am each day.
Last Friday afternoon we uninstalled the slony cluster (1 master + 1 slave) so that we could add a new second slave. (I find it easier to uninstall the cluster and then reapply a new setup, subscribe script rather than trying to add the single node.)
The cluster was successfully removed, and then rebuilt with 1 master and 2 slave nodes.
However, we didn't stop and start our web application which uses Tomcat connection pool and continued to run against the master (during the uninstall and rebuild). Everything appeared fine.
Only today while checking something else did I notice lots of ERROR: relation with OID 1036711 does not exist errors in the postgres serverlog
In our tomcat application we also saw lots of errors like
org.postgresql.util.PSQLException: ERROR: relation with OID 1036711 does not exist
at org.postgresql.util.PSQLException.parseServerError(PSQLException.java:139)
at org.postgresql.core.QueryExecutor.executeV3(QueryExecutor.java:154)
at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:101)
at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:43)
...
Basically these errors started when the cluster was uninstalled and continued until the web application was restarted (at 5am the following morning).
I'm not sure what object had OID 1036711 - maybe the slony replication schema?? Is there a way to find out after the event?
I do have daily full backups of the master each day going back 1 month - I could load one into another database and lookup the OID, if someone could explain how to do that. And if that would be useful?
But the net effect is that we got bizarre failures in our application, and large gaps in primary key values (using sequences) in certain tables within the database. On the whole the app seemed to be behaving fine, with the odd error that we put down to user error... (bad assumption, I guess in hindsight)
At the moment I'm trying to figure out exactly what kind of damage our data has suffered.
If anyone has any ideas or suggestions as to what went wrong or what to check for, I'd appreciate hearing.
Thanks
John
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend