Re: Slow count(*) again...

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Sun, 10 Oct 2010 23:30:09 +0800

On 10/10/2010 6:29 PM, Neil Whelchel wrote:
On the other hand, I copied a table out of one of my production servers that
has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
text). The first numeric column has numbers evenly spread between 0 and 100
and it is indexed. I put the table in a pair of database servers both running
on the same physical hardware. One server is Postgres, the other is a popular
server (I am not mentioning names here).

Please do. Your comment is pretty meaningless otherwise.

If you're talking about MySQL: Were you using InnoDB or MyISAM table 
storage? Of course it's fast with MyISAM, it relies on locks to do 
updates and has bugger all capability for write concurrency, or to 
permit readers while writing is going on.

If you're using InnoDB, then I'd like to know how they've managed that.

If you're talking about some *other* database, please name it and 
provide any useful details, because the hand waving is not helpful.

> I don't think that any amount of settings
tweaking will bring them even in the same ball park.

If you are, in fact, comparing MySQL+MyISAM and PostgreSQL, then you're 
quite right. Pg will never have such a fast count() as MyISAM does or 
the same insanely fast read performance, and MyISAM will never be as 
reliable, robust or concurrency-friendly as Pg is. Take your pick, you 
can't have both.

There has been discussion
about the other server returning an incorrect count because all of the indexed
rows may not be live at the time. This is not a problem for the intended use,
that is why I suggested another function like estimate(*). It's name suggests
that the result will be close, not 100% correct, which is plenty good enough
for generating a list of results pages in most cases.

Do you have any practical suggestions for generating such an estimate, 
though? I find it hard to think of any way the server can do that 
doesn't involve executing the query. The table stats are WAY too general 
and a bit hit-and-miss, and there isn't really any other way to do it.

If all you want is a way to retrieve both a subset of results AND a 
count of how many results would've been generated, it sounds like all 
you really need is a way to get the total number of results returned by 
a cursor query, which isn't a big engineering challenge. I expect that 
in current Pg versions a trivial PL/PgSQL function could be used to 
slurp and discard unwanted results, but a better in-server option to 
count the results from a cursor query would certainly be nice.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance