Re: Slow query performance

Joris Dobbelsteen <joris@xxxxxxxxxxxxxxxxxxxxx> · Sun, 02 Nov 2008 19:26:05 +0100

Kevin Galligan wrote, On 29-10-08 23:35:
An example of a slow query is...

select count(*) from bigdatatable where age between 22 and 40 and state 
= 'NY';

explain analyze returned the following...

 Aggregate  (cost=5179639.55..5179639.56 rows=1 width=0) (actual 
time=389529.895..389529.897 rows=1 loops=1)
   ->  Bitmap Heap Scan on bigdatatable  (cost=285410.65..5172649.63 
rows=2795968 width=0) (actual time=6727.848..387159.175     
                            rows=2553273 loops=1)
         Recheck Cond: ((state)::text = 'NY'::text)
         Filter: ((age >= 22) AND (age <= 40))
         ->  Bitmap Index Scan on idx_jstate  (cost=0.00..284711.66 
rows=15425370 width=0) (actual time=6298.950..6298.950 
ro                                ws=16821828 loops=1)
               Index Cond: ((state)::text = 'NY'::text)
 Total runtime: 389544.088 ms

It looks like the index scans are around 6 seconds or so each, but then 
the bitmap heap scan and aggregate jump up to 6 mintues.

Indeed. Its cause is that PostGreSQL must traverse the data in order to 
verify if the data is valid for the transaction. This means A LOT of 
data must be retrieved from disk.

The only real thing you can do is reduce I/O load, by reducing the 
amount of data that must be traversed (or ensuring the data is stored 
closely together, but thats really hard to get right). This requires 
optimizing your database design for that single goal.
This will not make it scale any better than it currently does, however. 
The query will scale O(N) with the size of your table, you want other 
techniques to do better.

Another thing is spending extra money on hardware that can sustain 
higher I/O seek rates (more and/or faster spindles).

- Joris

More detail on the table design and other stuff in a bit...

On Wed, Oct 29, 2008 at 6:18 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx 
<mailto:scott.marlowe@xxxxxxxxx>> wrote:

    On Wed, Oct 29, 2008 at 2:18 PM, Kevin Galligan <kgalligan@xxxxxxxxx
    <mailto:kgalligan@xxxxxxxxx>> wrote:
     > I'm approaching the end of my rope here.  I have a large database.
     > 250 million rows (ish).  Each row has potentially about 500 pieces of
     > data, although most of the columns are sparsely populated.

    A couple of notes here.  PostgreSQL stores null values as a single bit
    in a bit field, making sparsely populated tables quite efficient as
    long as you store the non-existent values as null and not '' or some
    other real value.

    Have you run explain analyze on your queries yet?  Pick a slow one,
    run explain analyze on it and post it and we'll see what we can do.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general