Search Postgresql Archives

Re: Query planner refuses to use index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 25 July 2005 15:43, Michael Fuhr pondered:
> Whatever the results of your experiments, could you post the settings
> you tried and the corresponding EXPLAIN ANALYZE outputs?

I did lots of tests now that you pointed me to a useful guide, also taking 
what's in the documentation into account. In the attached file I have 
documented my results.

There are three sections to the file, each separated by '====' markers. The 
first section deals in detail with the EXPLAIN ANALYZE info relating to the 
troublesome queries. The second is probably of least interest, just showing 
that I could implement my problem differently to improve performance.

But the last section is the most important, where I varied 
effective_cache_size, random_page_cost, shared_buffers and cpu_tuple_costs, 
each on its own with the other ones assuming default values(unless 
indicated).

To summarise, increasing effective_cache_size and decreasing random_page_cost 
both yield in lower index scan cost estimates while not changing the seqscan 
ones. As expected, increasing shared_buffers makes no difference whatsoever 
in the query cost estimates or the actual query times. A higher cpu_tuple 
cost penalises the seqscans significantly while only slightly increasing the 
index scan estimates.

Also note that these are all related to the query planner only, they do NOT 
change the actual query time which explains why I did not include EXPLAIN 
ANALYZE outputs, only plain EXPLAIN ones.

In order to make PostgreSQL choose the index scans when I need them (other 
than by setting enable_seq_scans to off), I ended up choosing
        effective_cache_size 40000
        random_page_cost 2.5
        cpu_tuple_cost 0.08
as only a combination yielded the desired results. Hardly optimal, but the 
real problem seems to lie with the correlation of the indexed columns (see 
other post in this thread). If I encounter trouble with these somewhere down 
the line, I'll post again.

Hope this helps someone out there.

-- 
Kilian Hagemann

Climate Systems Analysis Group
University of Cape Town
Republic of South Africa
Tel(w): ++27 21 650 2748
It all began with the most basic of queries on a clean database (first batch 5min data only, vacuum full analyze'd), namely

station_data=# EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Seq Scan on speed  (cost=0.00..242637.38 rows=1073843 width=8) (actual time=98080.848..104617.800 rows=1094400 loops=1)
   Filter: (set_id = 25::smallint)
 Total runtime: 109957.981 ms
(3 rows)

which chooses a seqscan by default . Disabling seqscan manually causes an index scan strategy which takes only ~ 12 sec:

station_data=# EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using speed_pkey on speed  (cost=0.00..3194177.02 rows=1073843 width=8) (actual time=90.544..6881.291 rows=1094400 loops=1)
   Index Cond: (set_id = 25::smallint)
 Total runtime: 12243.179 ms
(3 rows)

Making the query a bit more complex yields the same picture(default configuration is 9x slower):

    seqscans disabled:
        station_data=# EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint AND rec_time >= '1999/01/01';
                                                                    QUERY PLAN
        -------------------------------------------------------------------------------------------------------------------------------------
        Index Scan using speed_pkey on speed  (cost=0.00..1932444.35 rows=649115 width=8) (actual time=0.169..6520.960 rows=652345 loops=1)
        Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
        Total runtime: 11664.710 ms
        (3 rows)
    default (seqscans enabled):
        station_data=# EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint AND rec_time >= '1999/01/01';
                                                            QUERY PLAN
        ----------------------------------------------------------------------------------------------------------------------
        Seq Scan on speed  (cost=0.00..277537.25 rows=649115 width=8) (actual time=94546.374..98789.401 rows=652345 loops=1)
        Filter: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
        Total runtime: 101833.815 ms
        (3 rows)

Only when adding more conditions does the index scan seem attractive for the planner, and it chooses the index no matter what:

    station_data=# EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint AND rec_time BETWEEN '1999/01/01'AND '2000/01/01';
                                                                    QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------------------------------
    Index Scan using speed_pkey on speed  (cost=0.00..311175.90 rows=104431 width=8) (actual time=83.423..821.657 rows=105121 loops=1)
    Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
    Total runtime: 1310.723 ms
    (3 rows)

Now we'll use the same systematic approach, just with the combined speed/direction queries of the type we're after:

station_data=# EXPLAIN SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
station_data-# s.set_id = 25::smallint AND d.set_id = 26::smallint
station_data-# AND s.rec_time=d.rec_time;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Merge Join  (cost=776310.23..806022.76 rows=1623364 width=8)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=388154.12..390838.72 rows=1073843 width=6)
         Sort Key: d.rec_time
         ->  Seq Scan on direction d  (cost=0.00..242635.38 rows=1073843 width=6)
               Filter: (set_id = 26::smallint)
   ->  Sort  (cost=388156.12..390840.72 rows=1073843 width=6)
         Sort Key: s.rec_time
         ->  Seq Scan on speed s  (cost=0.00..242637.38 rows=1073843 width=6)
               Filter: (set_id = 25::smallint)
(10 rows)
station_data=# SET enable_seqscan TO OFF;
station_data=# EXPLAIN ANALYZE SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
station_data-# s.set_id = 25::smallint AND d.set_id = 26::smallint
station_data-# AND s.rec_time=d.rec_time;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=6679390.14..6709102.67 rows=1623364 width=8) (actual time=58160.045..87179.399 rows=1094400 loops=1)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=3339694.38..3342378.99 rows=1073843 width=6) (actual time=33347.556..38652.736 rows=1094400 loops=1)
         Sort Key: d.rec_time
         ->  Index Scan using direction_pkey on direction d  (cost=0.00..3194175.64 rows=1073843 width=6) (actual time=112.973..16233.524 rows=1094400 loops=1)
               Index Cond: (set_id = 26::smallint)
   ->  Sort  (cost=3339695.77..3342380.37 rows=1073843 width=6) (actual time=24812.447..29836.246 rows=1094400 loops=1)
         Sort Key: s.rec_time
         ->  Index Scan using speed_pkey on speed s  (cost=0.00..3194177.02 rows=1073843 width=6) (actual time=80.720..13457.008 rows=1094400 loops=1)
               Index Cond: (set_id = 25::smallint)
 Total runtime: 81613.626 ms
(11 rows)

Next level of complexity:

station_data=# EXPLAIN SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
station_data-# s.set_id = 25::smallint AND d.set_id = 26::smallint
station_data-# AND s.rec_time=d.rec_time AND s.rec_time>='1999/01/01';
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Merge Join  (cost=749717.30..768142.12 rows=981289 width=8)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=361563.18..363185.97 rows=649115 width=6)
         Sort Key: s.rec_time
         ->  Seq Scan on speed s  (cost=0.00..277537.25 rows=649115 width=6)
               Filter: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
   ->  Sort  (cost=388154.12..390838.72 rows=1073843 width=6)
         Sort Key: d.rec_time
         ->  Seq Scan on direction d  (cost=0.00..242635.38 rows=1073843 width=6)
               Filter: (set_id = 26::smallint)
(10 rows)

station_data=# SET enable_seqscan TO OFF;
SET
station_data=# EXPLAIN ANALYZE SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
station_data-# s.set_id = 25::smallint AND d.set_id = 26::smallint
station_data-# AND s.rec_time=d.rec_time AND s.rec_time>='1999/01/01';
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5356164.66..5374589.48 rows=981289 width=8) (actual time=38426.142..54574.311 rows=652345 loops=1)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=2016470.28..2018093.07 rows=649115 width=6) (actual time=13332.304..16505.141 rows=652345 loops=1)
         Sort Key: s.rec_time
         ->  Index Scan using speed_pkey on speed s  (cost=0.00..1932444.35 rows=649115 width=6) (actual time=15.370..4180.006 rows=652345 loops=1)
               Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
   ->  Sort  (cost=3339694.38..3342378.99 rows=1073843 width=6) (actual time=20656.563..26500.827 rows=1094400 loops=1)
         Sort Key: d.rec_time
         ->  Index Scan using direction_pkey on direction d  (cost=0.00..3194175.64 rows=1073843 width=6) (actual time=44.112..9969.454 rows=1094400 loops=1)
               Index Cond: (set_id = 26::smallint)
 Total runtime: 57320.424 ms
(11 rows)

station_data=# SET enable_seqscan TO ON;
SET
station_data=# EXPLAIN SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
station_data-# s.set_id = 25::smallint AND d.set_id = 26::smallint
station_data-# AND s.rec_time=d.rec_time AND s.rec_time BETWEEN '1999/01/01' AND '2000/01/01';
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=708919.40..716389.03 rows=157872 width=8)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=320765.28..321026.36 rows=104431 width=6)
         Sort Key: s.rec_time
         ->  Index Scan using speed_pkey on speed s  (cost=0.00..311175.90 rows=104431 width=6)
               Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
   ->  Sort  (cost=388154.12..390838.72 rows=1073843 width=6)
         Sort Key: d.rec_time
         ->  Seq Scan on direction d  (cost=0.00..242635.38 rows=1073843 width=6)
               Filter: (set_id = 26::smallint)
(10 rows)

station_data=# SET enable_seqscan TO OFF;
SET
EXPLAIN SELECT s.rec_time, wind_speed, wind_direction FROM speed s, direction d WHERE
s.set_id = 25::smallint AND d.set_id = 26::smallint
AND s.rec_time=d.rec_time AND s.rec_time BETWEEN '1999/01/01'AND '2000/01/01';
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..1024658.27 rows=157872 width=8) (actual time=0.395..7387.471 rows=105121 loops=1)
   ->  Index Scan using speed_pkey on speed s  (cost=0.00..311175.90 rows=104431 width=6) (actual time=0.218..858.831 rows=105121 loops=1)
         Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
   ->  Index Scan using direction_pkey on direction d  (cost=0.00..6.81 rows=2 width=6) (actual time=0.030..0.039 rows=1 loops=105121)
         Index Cond: ((d.set_id = 26::smallint) AND ("outer".rec_time = d.rec_time))
 Total runtime: 7947.613 ms
(6 rows)
5151.794
11198.190
9849.849
5148.811
5064.850
average 7.394 sec (2.5 times slower than wind_data query)

An example of the performance of an inefficient search query (condition OUTSIDE the join):

station_data=# EXPLAIN ANALYZE SELECT s.rec_time, wind_speed, wind_direction
station_data-#         FROM speed s JOIN direction d ON (
station_data(#             s.rec_time=d.rec_time
station_data(#             AND
station_data(#             s.set_id=25::smallint
station_data(#             AND
station_data(#             d.set_id=26::smallint
station_data(#             )
station_data-# WHERE s.rec_time BETWEEN '1999/01/01' AND '2000/01/01';
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=709092.45..715560.75 rows=57718 width=8) (actual time=111723.482..114421.974 rows=105121 loops=1)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=320938.34..321199.55 rows=104487 width=6) (actual time=1599.714..2187.360 rows=105121 loops=1)
         Sort Key: s.rec_time
         ->  Index Scan using speed_pkey on speed s  (cost=0.00..311341.38 rows=104487 width=6) (actual time=0.151..553.042 rows=105121 loops=1)
               Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
   ->  Sort  (cost=388154.12..390838.72 rows=1073843 width=6) (actual time=105542.478..108179.999 rows=547177 loops=1)
         Sort Key: d.rec_time
         ->  Seq Scan on direction d  (cost=0.00..242635.38 rows=1073843 width=6) (actual time=86182.694..94456.869 rows=1094400 loops=1)
               Filter: (set_id = 26::smallint)
 Total runtime: 114833.711 ms
(11 rows)

A desperate and final attempt (yields a different query plan, BUT chooses index scans by default!!!):

station_data=# EXPLAIN ANALYZE SELECT s.rec_time, wind_speed, wind_direction FROM ( SELECT * FROM speed WHERE set_id=25::smallint AND speed.rec_time BETWEEN '1999/01/01'AND '2000/01/01' ) AS s JOIN ( SELECT * FROM direction WHERE set_id=26::smallint AND direction.rec_time BETWEEN '1999/01/01'AND '2000/01/01') AS d ON s.rec_time=d.rec_time;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=627397.55..628565.24 rows=14679 width=8) (actual time=8532.723..10929.575 rows=105121 loops=1)
   Merge Cond: ("outer".rec_time = "inner".rec_time)
   ->  Sort  (cost=320765.28..321026.36 rows=104431 width=6) (actual time=4195.636..4643.069 rows=105121 loops=1)
         Sort Key: speed.rec_time
         ->  Index Scan using speed_pkey on speed  (cost=0.00..311175.90 rows=104431 width=6) (actual time=0.229..1890.093 rows=105121 loops=1)
               Index Cond: ((set_id = 25::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
   ->  Sort  (cost=306632.27..306881.88 rows=99841 width=6) (actual time=4337.044..4845.900 rows=105121 loops=1)
         Sort Key: direction.rec_time
         ->  Index Scan using direction_pkey on direction  (cost=0.00..297500.03 rows=99841 width=6) (actual time=0.167..1857.420 rows=105121 loops=1)
               Index Cond: ((set_id = 26::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
 Total runtime: 11350.495 ms
(11 rows)
7583.417
6528.144
6288.434
12450.558
7546.104
average 8.624 sec (~3 times slower than combined wind_data)


======================================================================================================================================================================================================================================================================================================
for new wind_data table, had to cancel:

UPDATE wind_data SET direction = d.wind_direction FROM wind_data w, direction d WHERE w.rec_time=d.rec_time AND w.set_id=d.set_id/2;

It takes longer than 7 hours to complete, the update was hence abandoned altogether since hopes of optimisation are minimal (maybe through increased memory usage)

Using the new database layout with only one table for speed and direction, we get

station_data=# EXPLAIN SELECT rec_time, speed, direction FROM wind_data WHERE set_id=39::smallint;
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on wind_data  (cost=0.00..242671.38 rows=1073843 width=8)
   Filter: (set_id = 39::smallint)
(2 rows)

station_data=# SET enable_seqscan TO OFF;
station_data=# EXPLAIN ANALYZE SELECT rec_time, speed, direction FROM wind_data WHERE set_id=39::smallint;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using wind_data_pkey on wind_data  (cost=0.00..3230761.68 rows=1073843 width=8) (actual time=219.563..41598.457 rows=1094400 loops=1)
   Index Cond: (set_id = 39::smallint)
 Total runtime: 46862.019 ms
(3 rows)

station_data=# SET enable_seqscan TO ON;
station_data=# EXPLAIN  SELECT rec_time, speed, direction FROM wind_data WHERE set_id=39::smallint AND rec_time>='1999/01/01';
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Seq Scan on wind_data  (cost=0.00..277571.25 rows=631929 width=8)
   Filter: ((set_id = 39::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
(2 rows)

station_data=# SET enable_seqscan TO OFF;
station_data=# EXPLAIN ANALYZE SELECT rec_time, speed, direction FROM wind_data WHERE set_id=39::smallint AND rec_time>='1999/01/01';
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using wind_data_pkey on wind_data  (cost=0.00..1902808.95 rows=631929 width=8) (actual time=58.051..18961.633 rows=652345 loops=1)
   Index Cond: ((set_id = 39::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime))
 Total runtime: 22351.893 ms
(3 rows)

Finally, with the most complex query the planner chooses a sensible plan again no matter what:

station_data=# SET enable_seqscan TO ON;
station_data=# EXPLAIN ANALYZE SELECT rec_time, speed, direction FROM wind_data WHERE set_id=39::smallint AND rec_time BETWEEN '1999/01/01'AND '2000/01/01';
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using wind_data_pkey on wind_data  (cost=0.00..302862.80 rows=100492 width=8) (actual time=0.218..2135.167 rows=105121 loops=1)
   Index Cond: ((set_id = 39::smallint) AND (rec_time >= '1999-01-01 00:00:00+02'::abstime) AND (rec_time <= '2000-01-01 00:00:00+02'::abstime))
 Total runtime: 2646.776 ms
(3 rows)
5636.407
1085.729
2764.653
1073.631
2589.617
average 2.802 sec (fastest of all)




=======================================================================================================================================================================================
Tuning the configuration parameters(all testing performed on 7.4.8, but also
seems applicable to 8.0.3):

    Base case query - EXPLAIN ANALYZE SELECT * FROM speed WHERE set_id=25::smallint AND rec_time >= '1999/01/01';
    Chooses seqscan by default, estimate (cost=0.00..277537.25 rows=649115 width=8)
    Reference estimate of the index scan (cost=0.00..1932444.35 rows=649115 width=8)

    By tuning the following parameters, the index scan cost estimates become

    effective_cache_size (available disk cache for objects during queries, in 8KB blocks), default 1000(8MB):
        10000 - 0.00..1677583.14
        40000 - 0.00..867681.46
        100000 - 0.00..218071.09 -> first time smaller than seqscan, but unrealistic memory requirement

    random_page_cost (cost of fetching disk page non-sequentially, in multiples of seq fetch cost), default 4:
        3 - 0.00..1452747.53
        2 - 0.00..973050.72
        it shouldn't be allowed to dip below 2 (unrealistic)

    shared_buffers (shared memory), default 1000(8MB):
        3000 - identical (no change)
        5000 - identical

    cpu_tuple_cost - Sets the query planner's estimate of the cost of processing each row during a query. This is measured as a fraction of the cost of a sequential page fetch. The default is 0.01
        0.005 - seq 0.00..207737.50, index 0.00..1929198.77
        0.02 - seq 0.00..417136.75, index 0.00..1938935.50
        0.05 - seq 0.00..835935.25, index 0.00..1958408.94
        0.1 - seq 0.00..1533932.75, index 0.00..1990864.68

        random_page_cost of 3:
        0.005 - seq 0.00..207737.50, index 0.00..1449501.96
        0.02 - seq 0.00..417136.75, index 0.00..1459238.68
        0.05 - seq 0.00..835935.25, index 0.00..1478712.13
        0.1 - seq 0.00..1533932.75, index 0.00..1511167.87 -> index  better

    cpu_index_tuple_cost - Sets the query planner's estimate of the cost of
    processing each index row during an index scan. This is measured as a
    fraction of the cost of a sequential page fetch. The default is 0.001.

        0.0005 - 0.00..1932119.79
	0.0002 - 0.00..1931925.06
	0.0001 - 0.00..1931860.14
	0.00001 - 0.00..1931801.72 -> no major impact here...

Final settings:
	effective_cache_size 40000
	random_page_cost 2.5
	cpu_tuple_cost 0.08
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux