Really dumb planner decision

Matthew Wakeling <matthew@xxxxxxxxxxx> · Thu, 16 Apr 2009 11:37:29 +0100 (BST)

I have a query that is executed really badly by Postgres. It is a nine 
table join, where two of the tables are represented in a view. If I remove 
one of the tables from the query, then the query runs very quickly using a 
completely different plan.

Here is the view:

release-16.0-preview-09-apr=# \d locatedsequencefeatureoverlappingfeatures
View "public.locatedsequencefeatureoverlappingfeatures"
         Column         |  Type   | Modifiers
------------------------+---------+-----------
 overlappingfeatures    | integer |
 locatedsequencefeature | integer |
View definition:
 SELECT l1.subjectid AS overlappingfeatures, l2.subjectid AS locatedsequencefeature
   FROM location l1, location l2
  WHERE l1.objectid = l2.objectid AND l1.subjectid <> l2.subjectid AND bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end);

Here is the query that works:

SELECT *
FROM
    gene AS a1_,
    intergenicregion AS a2_,
    regulatoryregion AS a3_,
    chromosome AS a4_,
    location AS a5_,
    LocatedSequenceFeatureOverlappingFeatures AS indirect0,
    BioEntitiesDataSets AS indirect1
WHERE
        a1_.id = 1267676
    AND a1_.upstreamIntergenicRegionId = a2_.id
    AND a2_.id = indirect0.LocatedSequenceFeature
    AND indirect0.OverlappingFeatures = a3_.id
    AND a3_.chromosomeid = a4_.id
    AND a3_.chromosomeLocationId = a5_.id
    AND a3_.id = indirect1.BioEntities

QUERY PLAN
-----------------------------------------------------------------
 Nested Loop  (cost=0.00..44.82 rows=1 width=787)
              (actual time=18.347..184.178 rows=105 loops=1)
   ->  Nested Loop
        (cost=0.00..44.54 rows=1 width=626)
        (actual time=18.329..182.837 rows=105 loops=1)
         ->  Nested Loop
              (cost=0.00..43.82 rows=1 width=561)
              (actual time=18.249..180.801 rows=105 loops=1)
               ->  Nested Loop
                    (cost=0.00..43.51 rows=1 width=380)
                    (actual time=10.123..178.471 rows=144 loops=1)
                     ->  Nested Loop
                          (cost=0.00..42.85 rows=1 width=372)
                          (actual time=0.854..31.446 rows=142 loops=1)
                           ->  Nested Loop
                                (cost=0.00..38.57 rows=1 width=168)
                                (actual time=0.838..29.505 rows=142 loops=1)
                                 Join Filter: ((l1.subjectid <> l2.subjectid) AND (l2.objectid = l1.objectid))
                                 ->  Nested Loop
                                      (cost=0.00..10.02 rows=1 width=176)
                                      (actual time=0.207..0.218 rows=1 loops=1)
                                       ->  Index Scan using gene_pkey on gene a1_
                                             (cost=0.00..4.29 rows=1 width=160)
                                             (actual time=0.107..0.110 rows=1 loops=1)
                                             Index Cond: (id = 1267676)
                                       ->  Index Scan using location__key_all on location l2
                                             (cost=0.00..5.70 rows=2 width=16)
                                             (actual time=0.090..0.093 rows=1 loops=1)
                                             Index Cond: (l2.subjectid = a1_.upstreamintergenicregionid)
                                 ->  Index Scan using location_bioseg on location l1
                                       (cost=0.00..12.89 rows=696 width=16)
                                       (actual time=0.095..26.458 rows=1237 loops=1)
                                       Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end))
                           ->  Index Scan using intergenicregion_pkey on intergenicregion a2_
                                 (cost=0.00..4.27 rows=1 width=204)
                                 (actual time=0.004..0.006 rows=1 loops=142)
                                 Index Cond: (a2_.id = a1_.upstreamintergenicregionid)
                     ->  Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets indirect1
                           (cost=0.00..0.63 rows=2 width=8)
                           (actual time=1.026..1.028 rows=1 loops=142)
                           Index Cond: (indirect1.bioentities = l1.subjectid)
               ->  Index Scan using regulatoryregion_pkey on regulatoryregion a3_
                     (cost=0.00..0.29 rows=1 width=181)
                     (actual time=0.008..0.009 rows=1 loops=144)
                     Index Cond: (a3_.id = l1.subjectid)
         ->  Index Scan using location_pkey on location a5_
               (cost=0.00..0.71 rows=1 width=65)
               (actual time=0.010..0.012 rows=1 loops=105)
               Index Cond: (a5_.id = a3_.chromosomelocationid)
   ->  Index Scan using chromosome_pkey on chromosome a4_
         (cost=0.00..0.27 rows=1 width=161)
         (actual time=0.003..0.005 rows=1 loops=105)
         Index Cond: (a4_.id = a3_.chromosomeid)
 Total runtime: 184.596 ms
(25 rows)

Here is the query that does not work:

SELECT *
FROM
    gene AS a1_,
    intergenicregion AS a2_,
    regulatoryregion AS a3_,
    chromosome AS a4_,
    location AS a5_,
    dataset AS a6_,
    LocatedSequenceFeatureOverlappingFeatures AS indirect0,
    BioEntitiesDataSets AS indirect1
WHERE
        a1_.id = 1267676
    AND a1_.upstreamIntergenicRegionId = a2_.id
    AND a2_.id = indirect0.LocatedSequenceFeature
    AND indirect0.OverlappingFeatures = a3_.id
    AND a3_.chromosomeid = a4_.id
    AND a3_.chromosomeLocationId = a5_.id
    AND a3_.id = indirect1.BioEntities
    AND indirect1.DataSets = a6_.id

I just left this running overnight, and it hasn't completed an EXPLAIN 
ANALYSE. It is basically the previous query (which returns 105 rows) with 
another table attached on a primary key. Should be very quick.

QUERY PLAN
---------------------------------------------------------------------------
 Nested Loop
   (cost=0.21..49789788.95 rows=1 width=960)
   ->  Nested Loop
         (cost=0.21..49789788.67 rows=1 width=799)
         ->  Nested Loop
               (cost=0.21..49789787.94 rows=1 width=734)
               ->  Merge Join
                     (cost=0.21..49789787.64 rows=1 width=553)
                     Merge Cond: (a1_.upstreamintergenicregionid = a2_.id)
                     ->  Nested Loop
                           (cost=0.00..99575037.26 rows=2 width=349)
                           ->  Nested Loop
                                 (cost=0.00..99575036.70 rows=2 width=176)
                                 ->  Nested Loop
                                       (cost=0.00..99575036.05 rows=1 width=168)
                                       Join Filter: (a1_.upstreamintergenicregionid = l2.subjectid)
                                       ->  Index Scan using gene__upstreamintergenicregion on gene a1_
                                             (cost=0.00..6836.09 rows=1 width=160)
                                             Index Cond: (id = 1267676)
                                       ->  Nested Loop
                                             (cost=0.00..99507386.51 rows=4865076 width=8)
                                             Join Filter: ((l1.subjectid <> l2.subjectid) AND (l1.objectid = l2.objectid))
                                             ->  Index Scan using location__key_all on location l1
                                                   (cost=0.00..158806.58 rows=3479953 width=16)
                                             ->  Index Scan using location_bioseg on location l2
                                                   (cost=0.00..12.89 rows=696 width=16)
                                                   Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end))
                                 ->  Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets indirect1
                                       (cost=0.00..0.63 rows=2 width=8)
                                       Index Cond: (indirect1.bioentities = l1.subjectid)
                           ->  Index Scan using dataset_pkey on dataset a6_
                                 (cost=0.00..0.27 rows=1 width=173)
                                 Index Cond: (a6_.id = indirect1.datasets)
                     ->  Index Scan using intergenicregion_pkey on intergenicregion a2_
                           (cost=0.00..2132.03 rows=54785 width=204)
               ->  Index Scan using regulatoryregion_pkey on regulatoryregion a3_
                     (cost=0.00..0.29 rows=1 width=181)
                     Index Cond: (a3_.id = l1.subjectid)
         ->  Index Scan using location_pkey on location a5_
               (cost=0.00..0.71 rows=1 width=65)
               Index Cond: (a5_.id = a3_.chromosomelocationid)
   ->  Index Scan using chromosome_pkey on chromosome a4_
         (cost=0.00..0.27 rows=1 width=161)
         Index Cond: (a4_.id = a3_.chromosomeid)
(27 rows)

I'm curious about two things - firstly why is it choosing such a dumb way 
of joining l1 to l2, with a full index scan on l1, where it could use a 
conditional index scan on l1 as with the working query? Secondly, why is 
the merge join's cost half that of the nested loop inside it?

geqo threshold is set to 15, so this is not the genetic optimiser stuffing 
up. Besides, it creates the same plan each time. The database is fully 
analysed with a reasonably high statistics target. Here are all the 
non-comment entries in postgresql.conf:

listen_addresses = '*'          # what IP address(es) to listen on;
max_connections = 300                   # (change requires restart)
shared_buffers = 500MB                  # min 128kB or max_connections*16kB
temp_buffers = 100MB                    # min 800kB
work_mem = 2000MB                               # min 64kB
maintenance_work_mem = 1600MB           # min 1MB
max_stack_depth = 9MB                   # min 100kB
max_fsm_pages = 204800                  # min max_fsm_relations*16, 6 bytes each
random_page_cost = 2.0                  # same scale as above
effective_cache_size = 23GB
geqo_threshold = 15
default_statistics_target = 500         # range 1-1000
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = 'pg_log'                # directory where log files are written,
log_truncate_on_rotation = on           # If on, an existing log file of the
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_min_duration_statement = 0          # -1 is disabled, 0 logs all statements
log_duration = on
log_line_prefix = '%t '                 # special values:
log_statement = 'all'                   # none, ddl, mod, all
datestyle = 'iso, mdy'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
default_text_search_config = 'pg_catalog.english'

Anything I can do to solve this?

Matthew

--
Surely the value of C++ is zero, but C's value is now 1?
 -- map36, commenting on the "No, C++ isn't equal to D. 'C' is undeclared
 [...] C++ should really be called 1" response to "C++ -- shouldn't it
 be called D?"

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance