> -----Original Message----- > From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] > Sent: Friday, July 20, 2012 4:47 PM > To: David Johnston > Cc: pgsql-general@xxxxxxxxxxxxxx > Subject: Re: A Better Way? (Multi-Left Join Lookup) > > "David Johnston" <polobo@xxxxxxxxx> writes: > > WITH > > full_set AS ( ) -- 8,500 records > > , sub_1 AS () -- also about 8,500 > > , sub_2 AS () -- maybe 5,000 > > , sub_3 AS () - - maybe 3,000 > > SELECT full_set.* > > , COALESCE(sub_1.field, FALSE) > > , COALESCE(sub_2.field, FALSE) > > , COALESCE(sub_2.field, FALSE) > > FROM full_set > > LEFT JOIN sub_1 > > LEFT JOIN sub_2 > > LEFT JOIN sub_3 > > > The performance of this query is exponential due to the fact that the > > sub-queries/CTEs are not indexed and so each subset has to be scanned > > completely for each record in the full set. > > Surely not. Neither merge nor hash joins require an index. What plan is > getting selected? Are you sure there's at most one match in each "sub" set > for each row in the "full" set? If you were getting a large number of matches > in some cases, the size of the result could balloon to something unfortunate > ... but we have not got enough information to know. > > regards, tom lane The final result, in this case would have 8,500 records AND sub_1.field would be TRUE for basically all of them and FALSE for the minimal remainder sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them There is never, in reality, two records in a sub-table for a single record in the master table. It is possible a record exists in a sub-table but not in the main table but I do not care about those (thus the LEFT instead of a FULL OUTER JOIN). I have attached a scrubbed query and explain/analyze. Let me know if something more is needed. I have included two versions of the query, one using CTE and the other using mostly sub-selects. I had run ANALYZE on the pertinent tables but the CTE queries all perform quite quickly when run by themselves. In looking at the source tables for the data I did notice that I have not properly defined the relevant INDEXes as being UNIQUE. This applies to two of the sub-tables. The third sub-table requires the use of "DISTINCT". The joining columns with each set of data are unique when fed into the LEFT JOIN. The master CTE/Query is generated via a function call and it also generates unique keys for the LEFT JOIN. Thank you for your help! David J.
QUERY WITH scenario_info AS () , source_listing AS () , detail_listing AS () , sequence_listing AS () , master_listing AS ( SELECT -- identifier fields FROM ( SELECT (func).* FROM ( SELECT fuction_generating_8500_records(...) ) func FROM scenario_info ) call ) master (function_column_rename) ) --# end "master_listing" CTE SELECT scenariokey AS "Scenario Key" , invoicenumber AS "Invoice Number" , COALESCE(has_sequence,FALSE) AS "Has Sequence" , COALESCE(has_detail,FALSE) AS "Has Detail" , COALESCE(has_source,FALSE) AS "Has Source" /* Omit a CASE WHEN + COALESCE That Provides Names To Each of the Possible Combinations */ FROM master_listing LEFT JOIN source_listing USING () LEFT JOIN detail_listing USING () LEFT JOIN sequence_listing USING () ; QUERY PLAN Nested Loop Left Join (cost=57926.57..58810.79 rows=1 width=67) (actual time=842.368..263801.626 rows=8656 loops=1) CTE scenario_info {# Support CTE #} -> Index Scan [...] (cost=0.25..8.52 rows=1 width=65) (actual time=44.855..44.859 rows=1 loops=1) CTE source_listing {# Sub 1 #} -> Nested Loop (cost=120.26..12219.68 rows=3594 width=21) (actual time=1.831..9.624 rows=2091 loops=1) Buffers: shared hit=408 -> CTE Scan [...] (cost=0.00..0.02 rows=1 width=62) (actual time=0.001..0.006 rows=1 loops=1) -> Bitmap Heap Scan [...] (cost=120.26..12174.74 rows=3594 width=21) (actual time=1.813..6.711 rows=2091 loops=1) -> Bitmap Index Scan [...] (cost=0.00..119.36 rows=3594 width=0) (actual time=1.604..1.604 rows=2091 loops=1) CTE detail_listing {# Sub 2 #} -> HashAggregate (cost=3046.18..3167.02 rows=12084 width=41) (actual time=129.044..132.509 rows=6042 loops=1) -> Seq Scan on [...] (cost=0.00..2872.53 rows=23153 width=41) (actual time=0.004..60.430 rows=23153 loops=1) CTE sequence_listing {# Sub 3 #} -> Bitmap Heap Scan [...] (cost=623.58..19768.42 rows=9585 width=36) (actual time=64.801..77.267 rows=8656 loops=1) -> Bitmap Index Scan [...] (cost=0.00..621.18 rows=9585 width=0) (actual time=64.635..64.635 rows=8656 loops=1) CTE master_listing {# The LEFT side of the multi-joins #} -> Subquery Scan on call (cost=22762.65..22762.94 rows=1 width=32) (actual time=619.158..735.559 rows=8656 loops=1) -> CTE Scan on scenario_info (cost=22762.65..22762.93 rows=1 width=196) (actual time=619.150..658.142 rows=8656 loops=1) InitPlan 5 (returns $4) -> HashAggregate (cost=22752.62..22762.65 rows=1003 width=7) (actual time=219.015..219.172 rows=796 loops=1) -> Append (cost=19790.25..22750.11 rows=1003 width=7) (actual time=98.569..217.774 rows=1510 loops=1) -> HashAggregate (cost=19790.25..19790.47 rows=22 width=6) (actual time=98.567..99.428 rows=796 loops=1) -> Bitmap Heap Scan on [...] (cost=621.20..19790.01 rows=96 width=6) (actual time=49.550..96.681 rows=796 loops=1) Recheck Cond: [...] Filter: [...] -> Bitmap Index Scan on [...] (cost=0.00..621.18 rows=9585 width=0) (actual time=49.350..49.350 rows=8656 loops=1) -> HashAggregate (cost=2939.80..2949.61 rows=981 width=7) (actual time=117.373..117.543 rows=714 loops=1) -> Seq Scan on [...] (cost=0.00..2930.41 rows=3756 width=7) (actual time=0.582..115.888 rows=3429 loops=1) { # The CTE Queries Return Quite Quickly #} { # Now We Combine Them #} -> Nested Loop Left Join (cost=0.00..461.28 rows=1 width=66) (actual time=700.682..169850.684 rows=8656 loops=1) Join Filter: [...] -> Nested Loop Left Join (cost=0.00..125.81 rows=1 width=65) (actual time=635.866..33832.263 rows=8656 loops=1) Join Filter: [...] -> CTE Scan on master_listing (cost=0.00..0.02 rows=1 width=96) (actual time=619.163..798.343 rows=8656 loops=1) -> CTE Scan on source_listing (cost=0.00..71.88 rows=3594 width=101) (actual time=0.001..1.335 rows=2091 loops=8656) -> CTE Scan on sequence_listing (cost=0.00..191.70 rows=9585 width=143) (actual time=0.008..5.505 rows=8656 loops=8656) -> CTE Scan on detail_listing (cost=0.00..241.68 rows=12084 width=143) (actual time=0.019..3.790 rows=6042 loops=8656) Total runtime: 263811.946 ms # For this plan instead of using CTE the query inside the CTEs was moved inside the LEFT JOIN # I.E. LEFT JOIN ( <CTE QUERY> ) sub_n USING (...) WITH scenario_info AS () , master_listing AS ( SELECT scenariokey , invoicenumber , COALESCE(has_sequence, FALSE) AS has_sequence , COALESCE(has_detail, FALSE) AS has_detail , COALESCE(has_source, FALSE) AS has_source FROM ( SELECT (func).* FROM ( SELECT mage_create_sequencemaster(...) ) func FROM scenario_info ) call ) master (...) LEFT JOIN () seq USING (...) LEFT JOIN () detail USING (...) LEFT JOIN () src USING (...) ORDER BY invoicesequence --# Didn't Bother Ordering the other query... ) SELECT [...] FROM master_listing ; QUERY PLAN CTE Scan on master_listing (cost=26258.77..26258.79 rows=1 width=67) (actual time=541681.104..541690.533 rows=8656 loops=1) CTE scenario_info -> Index Scan using magescenario_pkey on magescenario (cost=0.25..8.52 rows=1 width=65) (actual time=40.845..40.852 rows=1 loops=1) CTE master_listing -> Sort (cost=26250.24..26250.25 rows=1 width=35) (actual time=541681.097..541682.659 rows=8656 loops=1) Sort Method: quicksort Memory: 1111kB -> Nested Loop Left Join (cost=25809.07..26250.23 rows=1 width=35) (actual time=860.953..541541.541 rows=8656 loops=1) -> Nested Loop Left Join (cost=25809.07..26241.73 rows=1 width=34) (actual time=860.769..540748.350 rows=8656 loops=1) -> Nested Loop Left Join (cost=22762.90..22772.61 rows=1 width=33) (actual time=758.079..415662.947 rows=8656 loops=1) -> CTE Scan on scenario_info (cost=22762.65..22762.93 rows=1 width=196) (actual time=715.729..816.423 rows=8656 loops=1) InitPlan 2 (returns $1) -> HashAggregate (cost=22752.62..22762.65 rows=1003 width=7) (actual time=243.929..244.647 rows=796 loops=1) -> Append (cost=19790.25..22750.11 rows=1003 width=7) (actual time=117.295..242.829 rows=1510 loops=1) -> HashAggregate (cost=19790.25..19790.47 rows=22 width=6) (actual time=117.294..117.476 rows=796 loops=1) -> Bitmap Heap Scan on [...] (cost=621.20..19790.01 rows=96 width=6) (actual time=83.059..116.925 rows=796 loops=1) -> Bitmap Index Scan [...] (cost=0.00..621.18 rows=9585 width=0) (actual time=82.989..82.989 rows=8656 loops=1) -> HashAggregate (cost=2939.80..2949.61 rows=981 width=7) (actual time=123.224..124.175 rows=714 loops=1) -> Seq Scan on [...] (cost=0.00..2930.41 rows=3756 width=7) (actual time=0.171..120.353 rows=3429 loops=1) -> Index Scan [...] (cost=0.25..9.66 rows=1 width=37) (actual time=0.099..0.110 rows=1 loops=8656) -> HashAggregate (cost=3046.18..3167.02 rows=12084 width=41) (actual time=0.013..6.093 rows=6042 loops=8656) -> Seq Scan on [...] (cost=0.00..2872.53 rows=23153 width=41) (actual time=0.005..48.263 rows=23153 loops=1) -> Index Scan using [...] (cost=0.00..8.49 rows=1 width=22) (actual time=0.065..0.066 rows=0 loops=8656) Total runtime: 541694.052 ms
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general