Craig Ringer wrote: > On 19/04/2010 4:51 PM, Craig Ringer wrote: > >> There might be another possible approach that uses the system >> "xmin/xmax" fields of each tuple. That'd permit your incremental dumps >> to be done read-only, saving you a whole lot of expensive I/O and bloat. >> I'm just not sure what I'm thinking of will work yet. > > Yeah. You can use a SERIALIZABLE transaction and > txid_current_snapshot(), which almost magically solves your problem. In > a single call it provides all the details about active and committed > transactions at the time of snapshot creation that you need. It even > gives you a list of transaction IDs for uncommitted transactions between > those ranges so old uncommitted transactions don't force you to > repeatedly dump data. That's all you need to know to do intelligent > incremental backup of a table. > > I haven't written the actual test code, but what you should need to do > (according to my probably flawed understanding) is: > > - Begin a read only SERIALIZABLE transaction Actually, it looks like READ COMMITTED is preferable, as it gives you information about any concurrently running transactions in txid_current_snapshot(), and you can do the actual dump in one statement anyway. > - Record txid_current_snapshot(), which you will need > for the next run. We'll call the value of the last > run's txid_current_snapshot() call 'txprev'. > - SELECT all rows that have: > tablename.xmin > current_snapshot_xmax(txprev) OR > tablename.xmin IN (txid_snapshot_xip(txprev)) I've tested this approach and it appears to work fine, *but* only for append-only tables. I don't seem to be smart enough to figure out how to correctly handle UPDATEs and DELETEs - I know the xmax (for last transaction in which the row is visible) field is key, but can't figure out a way to effectively use it to record deletions. I guess I don't understand MVCC or at least Pg's implementation of it even as well as I thought I did (which isn't very). Unless someone smarter steps in or I have the time to learn more of the details about this, I probably can't offer a pre-formed solution to your problem. You can always use a trigger-maintained change history table to track inserts/updates/deletes, and DELETE FROM ... RETURNING it. That's simple and easy, but generates plenty of extra I/O to do your progressive backup/copy. I'm going to stop talking to myself now. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general