Re: count(*) and bad design was: Experiences with extensibility

Sim Zacks <sim@xxxxxxxxxxxxxx> · Thu, 10 Jan 2008 09:06:50 +0200

It would be an administrative nightmare unless you had very few where clauses 
that you were tracking.

Instead of using a trigger, you could use Listen/Notify to call a daemon on the 
server to run the procedure and then you have no insert/delete overhead.
Or you could call the function on a cron job every 10 minutes...

Chris Browne wrote:
zb@xxxxxxxxxxx (Zoltan Boszormenyi) writes:
which will be fast and depending on the initial value of COUNT(*)
it will be very close to the exact figure. You can extend the example
with more columns if you know your SELECT COUNT(*) ... WHERE
conditions in advance but this way you have to keep several administrative
tables for different monitored tables. Again, this trades some disk space
and INSERT/DELETE operation speed on the monitored tables for
quicker count.

Actually, this approach will be Really Terrible for any cases where
multiple connections are adding/deleting tuples concurrently, as it
will force ALL updates to serialize behind the update to the central
table.

Occasionally, you'll have something even worse, namely a deadlock,
where two or more of the updates fighting over the single summary
tuple fall into a bad state, and one of them is forced to give up,
potentially rolling back its whole transaction.

[Waving hands for a moment]

What I would do *instead* would be for each INSERT to add a tuple with
a count of 1, and for each DELETE to add a tuple with count of -1, and
then to periodically have a single process walk through to summarize
the table.  There may be a further optimization to be had by doing a
per-statement trigger that counts the number of INSERTs/DELETEs done,
so that inserting 30 tuples (in the table being tracked) leads to
adding a single tuple with count of 30 in the summary table.

That changes the tradeoffs, again...

 - Since each INSERT/DELETE is simply doing an INSERT into the summary
   table, the ongoing activity is *never* blocking anything

 - You get the count by requesting 
    SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';

 - Note that the query is MVCC-consistent with the table!

 - Once in a while, you'll want to run a single procedure that, for
   each table, deletes all the existing records, and replaces them
   with a single one consisting of the sum of the individual values.

 - You can re-sync a table by running the query:
     begin;
        delete from record_count where tablename = 'foo';
        insert into record_count(tablename, rec_cnt) select 'foo', (select count(*) from foo);
     commit;

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org/