On 10/29/2010 9:49 AM, Merlin Moncure wrote:
On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett<karl.pickett@xxxxxxxxx> wrote:
Hello Postgres Hackers,
We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id> LAST_ID_I_GOT" to insert into a separate
reporting database. The problem is, this simple approach has a race
that will forever skip uncommitted events. I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits. How can we solve this? Basically
it's a phantom row problem but it spans transactions.
I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it. All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?
You don't have a sequence problem so much as a wrong implementation
problem. Sequences are always *grabbed* in order but they can hit the
table out of order and there is a time lag between when the sequence
value is generated and the transaction commits. If I issue 'begin',
insert a log record, and hold the commit for 5 minutes you are going
to skip the record because you are only looking at the last processed
record. Your algorithm is going to fail if you use a sequence,
timestamp, or gapless sequence to manage your queue position. You
need to divide your log records into two logical sets, procesed and
unprocessed, and look at the set as a whole.
I would suggest staging your unprocessed records to a queue table and
having your writer consume them and move them to a processed table.
You can also look at already built queuing implementations like PGQ
written by our spectacularly skilled friends at Skype (haven't used it
myself, but I've heard it's good!).
merlin
Yep, you dont want a sequence. You want a flag.
add a boolean "processed" flag, default it to false.
then every 5 minutes run this:
begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;
or, if you want to select them and do something to them:
begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;
Just make sure you do it all in the same transaction, so the update sees
the exact same set as the select.
You could also create a function index on processed to keep track of
just those that are false.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general