Re: count * performance issue

Arjen van der Meijden <acmmailing@xxxxxxxxxxxx> · Sat, 08 Mar 2008 09:04:31 +0100

On 6-3-2008 16:28 Craig James wrote:
On the one hand, I understand that Postgres has its architecture, and I 
understand the issue of row visibility, and so forth.  On the other 
hand, my database is just sitting there, nothing going on, no 
connections except me, and... it takes FIFTY FIVE SECONDS to count 20 
million rows, a query that either Oracle or MySQL would answer in a 
fraction of a second.  It's hard for me to believe there isn't a better 
way.

Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+----------+
| count(*) |
+----------+
| 21908505 |
+----------+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore 
relatively large. The hardware is quite beefy for a forum however (4 
3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.

If I use a table that contains about the same amount of records as the 
above and was before this query probably much less present in the 
innodb-buffer (but also less frequently touched by other queries), we 
see this:

mysql> select count(*) from messagesraw;
+----------+
| count(*) |
+----------+
| 21962804 |
+----------+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index, 
which for those tables is more or less identical.

Apparently the time is still table-size related, not necessarily 
tuple-count related. As this shows:

mysql> select count(*) from articlestats;
+----------+
| count(*) |
+----------+
| 34467246 |
+----------+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this 
was on another database on a system with somewhat different specs (8 
2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have 
no idea how well that index was in the system's cache prior to this query.

Repeating it makes it do that query in 6.65 seconds, repeating the 
12GB-query doesn't make it any faster.

Anyway, long story short: MySQL's table-count stuff also seems 
table-size related. As soon as the index it uses fits in the cache or it 
doesn't have to use the primary index, it might be a different story, 
but when the table(index) is too large to fit, it is quite slow.
Actually, it doesn't appear to be much faster than Postgresql's (8.2) 
table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB 
ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record 
table wich is similar to the above articlestats, it is able to return a 
count(*) in 3 seconds after priming the cache.

If you saw instantaneous results with MySQL, you have either seen the 
query-cache at work or where using myisam. Or perhaps with a fast 
system, you had small tuples with a nice index in a nicely primed cache.

Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance