On Tue, 12 Oct 2010, Craig Ringer wrote:
On 10/12/2010 04:22 PM, david@xxxxxxx wrote:
from a PR point of view, speeding up the trivil count(*) case could be
worth it, just to avoid people complaining about it not being fast.
At the cost of a fair bit more complexity, though, and slowing everything
else down.
complexity probably, although given how complex the planner is already is
this significant?
as far as slowing everything else down, why would it do that? (beyond the
simple fact that any new thing the planner can do makes the planner take a
little longer)
David Lang
The proper solution here remains, IMO, support for visibility information in
indexes, whether by storing it once in the index and once in the heap
(ouch!), storing it out-of-line, or using a covering index where one or more
columns are stored wholly in the index not in the table heap at all.
Here are a few of the many past discussions about this that have already
covered some of the same ground:
http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr
http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
(and the rest of the thread)
A decent look with Google will find many, many more.
in the case where you are doing a count(*) where query and the where is
on an indexed column, could the search just look at the index + the
visibility mapping rather than doing an sequential search through the
table?
Nope, because the visibility map, which is IIRC only one bit per page,
doesn't record how many tuples there are on the page, or enough information
about them to determine how many of them are visible to the current
transaction*.
as for your worries about the accuracy of a visibility based count in
the face of other transactions, wouldn't you run into the same issues if
you are doing a sequential scan with the same transactions in process?
No. Every tuple in a table heap in postgresql has hidden fields, some of
which are used to determine whether the current transaction* can "see" the
tuple - it may have been inserted after this transaction started, or deleted
before this transaction started, so it's not visible to this transaction but
may still be to others.
http://www.postgresql.org/docs/current/static/ddl-system-columns.html
This information isn't available in the visibility map, or in indexes. That's
why PostgreSQL has to hit the heap to find it.
* current transaction should really be "current snapshot". The snapshot is
taken at the start of the whole transaction for SERIALIZABLE isolation, and
at the start of each statement for READ COMMITTED isolation.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance