On Thu, Jun 11, 2009 at 12:59 PM, Brett Henderson<brett@xxxxxxxxxx> wrote: > I have a couple of hesitations with using this approach: > 1. We can only run the replicator once. > 2. We can only run a single replicator. > 3. It requires write access to the db. > > 1 is perhaps the biggest issue. It means that we only get one shot at > reading changes, and if something goes wrong we lose the results. It's nice > being able to re-generate when something goes wrong. I was picturing only actually committing the update once you're sure all the files are generated. So if something goes wrong you roll back the database update. Another option would be to use an integer "batch_id" column instead of a boolean. Then you could recreate a previous batch if the file is subsequently lost. An integer takes more space than a boolean but due to alignment issues it often works out the same. > We could live with 2, although it makes it impossible to test new > replication mechanisms without adding additional columns for each. Well four boolean columns would, depending on the rest of the table layout, probably take the same space as one anyways due to alignment issues. > 3 is also a major consideration, it makes everybody's life easier if we can > avoid updates being made to the db by the replicator. Yeah, having to update every record once would have a major impact. It would mean twice as many tuples in the table and every index (except the partial index). Vacuum would be a major issue for avoiding bloat in both the table and the indexes. I'm trying to see how to do it without at least updating, or inserting into a separate queue table, but I'm not immediately seeing a good solution. I've done something similar to what you're doing now once, but there the transactions were simple inserts, we just selected all the work for the previous hour at 5 minutes past the hour and were done with it. In your case you would have to delay replication by 30min+. Implementing what you describe with txid would depend on a lot of internal logic which could change in future releases. It would also tie you to never updating or deleting rows in the updates which seems like it might be a problem in the future. -- Gregory Stark http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general