Re: Performance of count(*)

Steve Atkins <steve@xxxxxxxxxxx> · Thu, 22 Mar 2007 10:53:24 -0700

On Mar 22, 2007, at 10:21 AM, Craig A. James wrote:

Tino Wildenhain wrote:
Craig A. James schrieb:
...
In our case (for a variety of reasons, but this one is critical),  
we actually can't use Postgres indexing at all -- we wrote an  
entirely separate indexing system for our data...

...There is no need to store or
maintain this information along with postgres when you can store
and maintain it directly in postgres as well.

Whether we store our data inside or outside Postgres misses the  
point (in fact, most of our data is stored IN Postgres).  It's the  
code that actually performs the index operation that has to be  
external to Postgres.

On top of that, postgres has a very flexible and extensible index
system.

You guys can correct me if I'm wrong, but the key feature that's  
missing from Postgres's flexible indexing is the ability to  
maintain state across queries.  Something like this:

 select a, b, my_index_state() from foo where ...
   offset 100 limit 10 using my_index(prev_my_index_state);

The my_index_state() function would issue something like a  
"cookie", an opaque text or binary object that would record  
information about how it got from row 1 through row 99.  When you  
issue the query above, it could start looking for row 100 WITHOUT  
reexamining rows 1-99.

This could be tricky in a OLTP environment, where the "cookie"  
could be invalidated by changes to the database.  But in warehouse  
read-mostly or read-only environments, it could yield vastly  
improved performance for database web applications.

If I'm not mistaken, Postgres (nor Oracle, MySQL or other RDBMS)  
can't do this.  I would love to be corrected.

As long as you're ordering by some row in the table then you can do  
that in
straight SQL.

select a, b, ts from foo where (stuff) and foo > X order by foo limit 10

Then, record the last value of foo you read, and plug it in as X the  
next
time around.

This has the advantage over a simple offset approach of actually
displaying all the data as a user pages through it too. (Consider
the case where the user is viewing offsets 91-100, and you delete
the record at offset 15. The user goes to the next page and will
miss the record that used to be at offset 101 and is now at offset
100).

The problem is that relational databases were invented before the  
web and its stateless applications.  In the "good old days", you  
could connect to a database and work for hours, and in that  
environment cursors and such work well -- the RDBMS maintains the  
internal state of the indexing system.  But in a web environment,  
state information is very difficult to maintain.  There are all  
sorts of systems that try (Enterprise Java Beans, for example), but  
they're very complex.

I think the problem is more that most web developers aren't very good
at using the database, and tend to fall back on simplistic, wrong,  
approaches
to displaying the data. There's a lot of monkey-see, monkey-do in web
UI design too, which doesn't help.

Cheers,
  Steve