Re: Slow count(*) again...

Neil Whelchel <neil.whelchel@xxxxxxxxx> · Tue, 12 Oct 2010 23:47:19 -0700

On Sunday 10 October 2010 21:15:56 Neil Whelchel wrote:

> Right now, I am building a test machine with two dual core Intel processors
> and two 15KRPM mirrored hard drives, 1 GB ram. I am using a small amount of
> ram because I will be using small test tables. I may do testing in the
> future with more ram and bigger tables, but I think I can accomplish what
> we are all after with what I have. The machine will be limited to running
> the database server in test, init, bash, and ssh, no other processes will
> be running except for what is directly involved with testing. I will post
> exact specs when I post test results. I will create some test tables, and
> the same tables will be used in all tests. Suggestions for optimal
> Postgres and system  configuration are welcome. I will try any suggested
> settings that I have time to test. -Neil-
> 

Ok the test machine is up and running:
A few more details, the hard drives are SCSI Ultra-320, the CPUs are 2.8 GHZ, 
533 MHZ FSB. I wanted to make a more memory cramped machine to keep the table 
to RAM ratio closer to the production machines, but for now, all I have are 
1GB DDRs, and the machine requires pairs, so total memory is 2GB. Swap is 
turned off.

The data I will be using is a couple of days of raw data from a production 
system. The columns of interest are numeric and timestamp. I will use the 
exact same data for all tests.

                           Table "public.log"
      Column      |            Type             |       Modifiers        
------------------+-----------------------------+------------------------
 batch_id         | integer                     | 
 t_stamp          | timestamp without time zone | not null default now()
 raw_data         | numeric                     | 
 data_value       | numeric                     | 
 data_value_delta | numeric                     | 
 journal_value    | numeric                     | 
 journal_data     | numeric                     | 
 machine_id       | integer                     | not null
 group_number     | integer                     | 
Indexes:
    "log_idx" btree (group_number, batch_id)
    "log_oid_idx" btree (oid)
    "log_t_stamp" btree (t_stamp)

The initial test is with XFS with write barriers turned on, this makes for 
very slow writes. The point of the first test is to get a baseline of 
everything out-of-the-box. So, here are the numbers:

Insert the data into one table:
crash:~# time psql -U test test -q < log.sql
real    679m43.678s
user    1m4.948s
sys     13m1.893s

crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;" 
  count   
----------
 10050886
(1 row)

real    0m11.812s
user    0m0.000s
sys     0m0.004s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;" 
  count   
----------
 10050886
(1 row)

real    0m3.737s
user    0m0.000s
sys     0m0.000s

As can be seen here, the cache helps..
And the numbers are not all that bad, so let's throw a sabot into the gears:
crash:~# time psql -U test test -c "UPDATE log SET raw_data=raw_data+1"
UPDATE 10050886

real    14m13.802s
user    0m0.000s
sys     0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    3m32.757s
user    0m0.000s
sys     0m0.000s

Just to be sure:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    2m38.631s
user    0m0.000s
sys     0m0.000s

It looks like cache knocked about a minute off, still quite sad.
So, I shutdown Postgres, ran xfs_fsr, and started Postgres:
crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    1m36.304s
user    0m0.000s
sys     0m0.000s

So it seems that defragmentation knocked another minute off:
Let's see how much cache helps now:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    1m34.873s
user    0m0.000s
sys     0m0.000s

Not much... And we are a long way from the 3.7 seconds with a freshly inserted 
table. Maybe the maid can help here.
crash:~# time psql -U test test -c "VACUUM log;"
VACUUM

real    22m31.931s
user    0m0.000s
sys     0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    1m30.927s
user    0m0.000s
sys     0m0.000s

Nope...
So, possible conclusions are: 
1. Even with VACUUM database table speed degrades as tables are updated.
2. Time testing on a freshly INSERTed table gives results that are not real-
world. 
3. Filesystem defragmentation helps (some).
4. Cache only makes a small difference once a table has been UPDATEd.

I am going to leave this configuration running for the next day or so. This 
way I can try any suggestions and play with any more ideas that I have.
I will try these same tests on ext4 later, along with any good suggested 
tests.
I will try MySQL with the dame data with both XFS and ext4.
-Neil-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance