On 6/11/09, Brett Henderson <brett@xxxxxxxxxx> wrote: > Marko Kreen wrote: > > > 4-byte xids on btree may create data corruption. > > > > > Can you be more specific on this? I'm aware of xid being an unsigned > integer which means we need to deal with the cast resulting in negative > numbers. This means we have to split our range queries into several pieces > when crossing overflow points. But I'm not aware of any other issues. No, the issue is unstable '<' and '>' relationships between values. > > Solution is to use 8-byte txids via txid_current() for indexing. [1] > > > > > The main issue with 8-byte values are the size. When multipled by 500 > million rows it turns into a lot of space. I'd prefer to use int4 if it's > possible ... If you use PgQ, you dont need any extra column in table. As the room below PgQ is garbage-collected pretty quickly the extra space will not be a problem. > Sorry, I'm not sure what you're suggesting with txid_current(). We're > currently using the |txid_current_snapshot|() method which returns us the > maximum transaction id plus in-flight transactions. We specifically exclude > transactions that are in-flight from the query, then include them on > subsequent queries when they have committed. My suggestion was to use int8 txid column for querying, either on table (bit expensive indeed), or on queue tables (pgq). > > See pgq.batch_event_sql() function in Skytools [2] for how to > > query txids between snapshots efficiently and without being affected > > by long transactions. > > > > > I'll take a look. > > > In fact perhaps you can use PgQ directly instead building your own. > > It is built quite similarly to what you are planning - periodic > > snapshots and then queries on txids to get the data. > > > > > I must admit I'm not very familiar with PgQ, but I did take a quick look > when evaluating options. I'm possibly misguided here but serialising > everything via a queue doesn't seem like the most efficient way of > replicating large changesets. Most efficient is COPY-ing of whole table. If you want to replicate piecemeal, only changes that happened since last sync point, PgQ is quite efficient, compared to trying to read changes from main table: - Only small amount of recent data is duplicated, older data is cleaned up quite fast (configurable). - The extra index is small as it need to cover only small amount of data. > Another issue is that it would only allow > read-once behaviour that prevents a client from re-replicating if something > goes wrong. PgQ will give you exact same batch repeatedly, until you call pgq.finish_batch(). This allows you to coordinate commits between differnet databases. > How would you trigger message creation using PgQ? Would you typically use > triggers on the tables to be replicated? http://skytools.projects.postgresql.org/pgq/files/triggers-sql.html -- marko -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general