Re: more problems with count(*) on large table

Mike Charnoky <noky@xxxxxxxxxxx> · Tue, 02 Oct 2007 12:48:35 -0400

The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM.  The io
subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware
9500S-8 controller (Seagate Nearline ST3400632NS drives).  Currently,
shared_buffers is set to 50000 (nearly 400M)

As for the data stored in this large table, there are 15 columns.  Each
row takes roughly 134 bytes to store, not counting the index.  So, for
one day's worth of data we are talking about 1.5G/day (1.8G with the
index).  That's about 11.5M rows/day.  Although the data isn't stored
exactly sequentially by the indexed time field, it is pretty close.

If it takes PG ~40 minutes to count(*) one day's worth of records, the
avg throughput is 786k/s.  Watching iostat during the count(*)
operation, I see average read speeds in the range of 1100-1500k/s.

I guess I would expect postgres to perform a count(*) faster.  When I
run benchmarks on the machine with hdparm (with the db insert process
running), I see the disk averages > 80MB/sec for reads

# hdparm -tT /dev/sdb1
/dev/sdb1:
 Timing cached reads:   3884 MB in  2.00 seconds = 1942.85 MB/sec
 Timing buffered disk reads:  248 MB in  3.01 seconds =  82.49 MB/sec

Maybe PG has to do a lot of random disk access?  I'm running bonnie++
now to get more detailed disk performance info.  As Tomasz pointed out
maybe using CLUSTER would help, but this probably takes a long time to
perform.

Again, the only other thing happening with the db: a separate process is
inserting data into this table.  I have checkpoint_segments set to 64 so
that pg is not constantly thrashing the disk with writes.  The
transaction log is on a separate disk.

Mike

Bill Moran wrote:
> In response to Mike Charnoky <noky@xxxxxxxxxxx>:
> 
>> This is strange... count(*) operations over a period of one day's worth
>> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
>> first time the data is queried it takes about 40 minutes.  If I try the
>> query again, it finishes in 1-2 minutes!
> 
> This sounds like a caching issue.  My guess at what's happening is that
> other operations are pushing this data out of the shared_buffers, so
> when you run it, the system has to pull a bunch of tuples off the disk
> to check them.  If you run it again immediately, the tuples are still in
> memory, and it runs very fast.
> 
> If this is the case, you can speed up things by adding RAM/shared_buffers,
> or by moving to faster disks.  The RAM solution is going to give you the
> biggest performance improvement.
> 
> However, if there's enough other data on this system, you may have
> difficulty getting enough RAM to mitigate the problem, in which case,
> faster disks are going to be your best bet.
> 
> How much RAM do you have, and how much of it is allocated to shared_buffers?
> What's your IO subsystem look like?

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings