Re: queries on xmin

Greg Stark <stark@xxxxxxxxxxxxxxxx> · Thu, 11 Jun 2009 13:22:25 +0100

On Thu, Jun 11, 2009 at 12:59 PM, Brett Henderson<brett@xxxxxxxxxx> wrote:
> I have a couple of hesitations with using this approach:
> 1. We can only run the replicator once.
> 2. We can only run a single replicator.
> 3. It requires write access to the db.
>
> 1 is perhaps the biggest issue.  It means that we only get one shot at
> reading changes, and if something goes wrong we lose the results.  It's nice
> being able to re-generate when something goes wrong.

I was picturing only actually committing the update once you're sure
all the files are generated. So if something goes wrong you roll back
the database update.

Another option would be to use an integer "batch_id" column instead of
a boolean. Then you could recreate a previous batch if the file is
subsequently lost. An integer takes more space than a boolean but due
to alignment issues it often works out the same.

> We could live with 2, although it makes it impossible to test new
> replication mechanisms without adding additional columns for each.

Well four boolean columns would, depending on the rest of the table
layout, probably take the same space as one anyways due to alignment
issues.

> 3 is also a major consideration, it makes everybody's life easier if we can
> avoid updates being made to the db by the replicator.

Yeah, having to update every record once would have a major impact. It
would mean twice as many tuples in the table and every index (except
the partial index). Vacuum would be a major issue for avoiding bloat
in both the table and the indexes.

I'm trying to see how to do it without at least updating, or inserting
into a separate queue table, but I'm not immediately seeing a good
solution. I've done something similar to what you're doing now once,
but there the transactions were simple inserts, we just selected all
the work for the previous hour at 5 minutes past the hour and were
done with it. In your case you would have to delay replication by
30min+.

Implementing what you describe with txid would depend on a lot of
internal logic which could change in future releases. It would also
tie you to never updating or deleting rows in the updates which seems
like it might be a problem in the future.

-- 
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general