Re: modification time & transaction synchronisation problem

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 20 Apr 2010 13:14:49 +0800

Craig Ringer wrote:
> On 19/04/2010 4:51 PM, Craig Ringer wrote:
> 
>> There might be another possible approach that uses the system
>> "xmin/xmax" fields of each tuple. That'd permit your incremental dumps
>> to be done read-only, saving you a whole lot of expensive I/O and bloat.
>> I'm just not sure what I'm thinking of will work yet.
> 
> Yeah. You can use a SERIALIZABLE transaction and
> txid_current_snapshot(), which almost magically solves your problem. In
> a single call it provides all the details about active and committed
> transactions at the time of snapshot creation that you need. It even
> gives you a list of transaction IDs for uncommitted transactions between
> those ranges so old uncommitted transactions don't force you to
> repeatedly dump data. That's all you need to know to do intelligent
> incremental backup of a table.
> 
> I haven't written the actual test code, but what you should need to do
> (according to my probably flawed understanding) is:
> 
> - Begin a read only SERIALIZABLE transaction

Actually, it looks like READ COMMITTED is preferable, as it gives you
information about any concurrently running transactions in
txid_current_snapshot(), and you can do the actual dump in one statement
anyway.

> - Record txid_current_snapshot(), which you will need
>   for the next run. We'll call the value of the last
>   run's txid_current_snapshot() call 'txprev'.
> - SELECT all rows that have:
>   tablename.xmin > current_snapshot_xmax(txprev) OR
>   tablename.xmin IN (txid_snapshot_xip(txprev))

I've tested this approach and it appears to work fine, *but* only for
append-only tables.

I don't seem to be smart enough to figure out how to correctly handle
UPDATEs and DELETEs - I know the xmax (for last transaction in which the
row is visible) field is key, but can't figure out a way to effectively
use it to record deletions. I guess I don't understand MVCC or at least
Pg's implementation of it even as well as I thought I did (which isn't
very).

Unless someone smarter steps in or I have the time to learn more of the
details about this, I probably can't offer a pre-formed solution to your
problem.

You can always use a trigger-maintained change history table to track
inserts/updates/deletes, and DELETE FROM ... RETURNING it. That's simple
and easy, but generates plenty of extra I/O to do your progressive
backup/copy.

I'm going to stop talking to myself now.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general