Re: count * performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6-3-2008 16:28 Craig James wrote:
On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way.

Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+----------+
| count(*) |
+----------+
| 21908505 |
+----------+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore relatively large. The hardware is quite beefy for a forum however (4 3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.

If I use a table that contains about the same amount of records as the above and was before this query probably much less present in the innodb-buffer (but also less frequently touched by other queries), we see this:

mysql> select count(*) from messagesraw;
+----------+
| count(*) |
+----------+
| 21962804 |
+----------+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index, which for those tables is more or less identical.

Apparently the time is still table-size related, not necessarily tuple-count related. As this shows:

mysql> select count(*) from articlestats;
+----------+
| count(*) |
+----------+
| 34467246 |
+----------+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this was on another database on a system with somewhat different specs (8 2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have no idea how well that index was in the system's cache prior to this query.

Repeating it makes it do that query in 6.65 seconds, repeating the 12GB-query doesn't make it any faster.

Anyway, long story short: MySQL's table-count stuff also seems table-size related. As soon as the index it uses fits in the cache or it doesn't have to use the primary index, it might be a different story, but when the table(index) is too large to fit, it is quite slow. Actually, it doesn't appear to be much faster than Postgresql's (8.2) table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record table wich is similar to the above articlestats, it is able to return a count(*) in 3 seconds after priming the cache.

If you saw instantaneous results with MySQL, you have either seen the query-cache at work or where using myisam. Or perhaps with a fast system, you had small tuples with a nice index in a nicely primed cache.

Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux