Re: index scan backward plan question

Kevin Murphy <murphy@xxxxxxxxxxxxxxx> · Thu, 23 Mar 2006 16:15:49 -0500

> Neil Conway wrote:
> There is no such thing as "PG 8.3".

I meant 8.1.3

> please provide the queries that trigger the problem and the
> relevant schema definitions.

Sorry about not posting more details initially.  I was running out the 
door and was hasty.

> Try re-running ANALYZE and retrying EXPLAIN ANALYZE.

It was analyzed (and unmodified since analysis).  I also rebuilt the 
table from scratch - same results.

I'm beginning to think my performance problem is mostly related to tight 
memory.  The same query that was insanely slow on one machine (1GB RAM) 
was reasonably fast on another with more memory (8GB RAM).  Also, when I 
freed a bit more memory on the original machine, the performance wall 
went away.  PostgreSQL seems to be making OK decisions about when to use 
the 'index scan backward' plan, but under tight memory conditions, the 
index scan would take forever while the alternative would be fine.

My curiosity has sunk below my need to get real work done, so I don't 
require any further response.

If anybody has too much time on his or her hands, you can read what I 
wrote (below) before I decided it wasn't worth pursuing this any further.

Thanks,
Kevin Murphy

\d merged
     Table "public.merged"
  Column    |  Type   | Modifiers
--------------+---------+-----------
pmid         | integer |
first_author | text    |
authors      | text    |
title        | text    |
journal      | text    |
date         | date    |
display_date | text    |
volume       | text    |
issue        | text    |
pages        | text    |
symbol       | text    |
weight       | real    |
Indexes:
  "merged_data_idx" btree (date, first_author)
  "merged_first_author_idx" btree (first_author)
  "merged_pmid_idx" btree (pmid)
  "merged_symbol_idx" btree (symbol)
  "merged_weight_date_idx" btree (weight, date)

The query in question searches on 'symbol', orders by the combination of 
'weight' and 'date', and constrains by a limit and offset.  The table 
has 7,839,274 rows; 16,247 distinct symbol values; 17,279 distinct 
weight values; 831,007 distinct weight-date combinations.

Depending on the query search term and offset value, one of two 
different plans is chosen.  For search terms matching large numbers of 
rows, and with a low enough offset, an 'index scan backward' plan is 
chosen, which is sometimes extremely fast (faster than the other plan).  
However, for the search terms matching very large numbers of rows, the 
'index scan backward' plan can be molasses-slow (as much as 2 and 3 
orders of magnitude).  During such slow queries, the CPU is not doing 
much: postgres is consuming <=3% of CPU, and nothing else is happening 
on the box ('top' is using the most CPU).  Disk I/O is consistent and 
low at around 1MB/sec while this is going on.  Top doesn't show any 
swapping, although free physical memory is very low, but on the other 
hand there is lots of inactive memory (720MB).  During these very long 
queries, I see a lock on merged_weight_date_idx (which is 168 MB in 
size, BTW) the whole time.  There are no messages in the system or 
postgresql logs.  Can it really be taking PG 490 seconds (see below) to 
read some fraction of 168MB into memory?

* * * * * * * *

postgresql.conf settings:

shared_buffers = 15000
work_mem = 32768
maintenance_work_mem = 65536
default_statistics_target = 1000

I just noticed that default_statistics_target is set at one extreme.  
Since many of the columns have large numbers of values, highly skewed 
distributions, and the oddballs are often important, it was my 
understanding that this was not unreasonable (and early tests seemed to 
bear this out).  I stand ready to be corrected.

* * * * * * * *

OS:

OS X 10.4, 1 GB RAM, dual 1.8 Ghz PPC

* * * * * * * *

The following plan is produced for almost all values of 'symbol' except 
the values with the most records.  It is also produced for the 
high-frequency symbols if the OFFSET is large enough.  The following is 
a search for a less common symbol than the problematic cases ('CFTR' 
matches 8920 rows):

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date, 
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'CFTR'
ORDER BY weight DESC, date DESC
OFFSET 0 LIMIT 25;
                                                                QUERY 
PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=29364.77..29364.83 rows=25 width=229) (actual 
time=1351.026..1351.040 rows=25 loops=1)
 ->  Sort  (cost=29364.77..29385.54 rows=8310 width=229) (actual 
time=1351.023..1351.028 rows=25 loops=1)
       Sort Key: weight, date
       ->  Bitmap Heap Scan on merged  (cost=55.08..28823.76 rows=8310 
width=229) (actual time=60.076..1198.231 rows=8920 loops=1)
             Recheck Cond: (symbol = 'CFTR'::text)
             ->  Bitmap Index Scan on merged_symbol_idx  
(cost=0.00..55.08 rows=8310 width=0) (actual time=39.128..39.128 
rows=8920 loops=1)
                   Index Cond: (symbol = 'CFTR'::text)
Total runtime: 1357.684 ms

* * * * * * * *

In contrast, low-offset searches for the symbol 'ERVK6' (which matches 
142814 rows) produce a different plan, based on 'index scan backward'.  
For symbol 'ERVK6', I found that offset values of 1234 and less produce 
this plan (with a limit of 25).  Offset 1235 and above produce the 
non-"index scan backward" plan.

Here's the plan for symbol = 'ERVK6', OFFSET = 1234:

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date, 
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'ERVK6'
ORDER BY weight DESC, date DESC
OFFSET 1234 LIMIT 25;

QUERY 
PLAN                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=272879.06..278407.40 rows=25 width=229) (actual 
time=489694.717..489893.358 rows=25 loops=1)
 ->  Index Scan Backward using merged_weight_date_idx on merged  
(cost=0.00..31295954.88 rows=141525 width=229) (actual 
time=4.974..489891.348 rows=1259 loops=1)
       Filter: (symbol = 'ERVK6'::text)
Total runtime: 489893.584 ms

* * * * * * * *

Here's the plan for the same symbol with OFFSET = 1235:

EXPLAIN ANALYZE SELECT pmid, authors, title, journal, date, 
display_date, volume, issue, pages
FROM merged
WHERE symbol = 'ERVK6'
ORDER BY weight DESC, date DESC
OFFSET 1235 LIMIT 25;

QUERY 
PLAN                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=278441.55..278441.61 rows=25 width=229) (actual 
time=5940.268..5940.299 rows=25 loops=1)
 ->  Sort  (cost=278438.46..278792.27 rows=141525 width=229) (actual 
time=5938.818..5940.130 rows=1260 loops=1)
       Sort Key: weight, date
       ->  Bitmap Heap Scan on merged  (cost=919.34..245370.50 
rows=141525 width=229) (actual time=71.680..1169.222 rows=142814 loops=1)
             Recheck Cond: (symbol = 'ERVK6'::text)
             ->  Bitmap Index Scan on merged_symbol_idx  
(cost=0.00..919.34 rows=141525 width=0) (actual time=69.068..69.068 
rows=142814 loops=1)
                   Index Cond: (symbol = 'ERVK6'::text)
Total runtime: 6692.934 ms

The difference doesn't seem to be due to disk caching.  I can bounce 
back and forth between the two queries getting consistent relative 
timings (slow/fast/slow/fast).