Search Postgresql Archives

Re: A Better Way? (Multi-Left Join Lookup)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx]
> Sent: Friday, July 20, 2012 4:47 PM
> To: David Johnston
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  A Better Way? (Multi-Left Join Lookup)
> 
> "David Johnston" <polobo@xxxxxxxxx> writes:
> > WITH
> >   full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
> 
> > The performance of this query is exponential due to the fact that the
> > sub-queries/CTEs are not indexed and so each subset has to be scanned
> > completely for each record in the full set.
> 
> Surely not.  Neither merge nor hash joins require an index.  What plan is
> getting selected?  Are you sure there's at most one match in each "sub"
set
> for each row in the "full" set?  If you were getting a large number of
matches
> in some cases, the size of the result could balloon to something
unfortunate
> ... but we have not got enough information to know.
> 
> 			regards, tom lane

The final result, in this case would have 8,500 records AND
sub_1.field would be TRUE for basically all of them and FALSE for the
minimal remainder
sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them
sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them

There is never, in reality, two records in a sub-table for a single record
in the master table.  It is possible a record exists in a sub-table but not
in the main table but I do not care about those (thus the LEFT instead of a
FULL OUTER JOIN).

I have attached a scrubbed query and explain/analyze.  Let me know if
something more is needed.

I have included two versions of the query, one using CTE and the other using
mostly sub-selects.

I had run ANALYZE on the pertinent tables but the CTE queries all perform
quite quickly when run by themselves.

In looking at the source tables for the data I did notice that I have not
properly defined the relevant INDEXes as being UNIQUE.  This applies to two
of the sub-tables.  The third sub-table requires the use of "DISTINCT".  The
joining columns with each set of data are unique when fed into the LEFT
JOIN.  The master CTE/Query is generated via a function call and it also
generates unique keys for the LEFT JOIN.

Thank you for your help!

David J.




QUERY

WITH 
scenario_info AS ()
, source_listing AS ()
, detail_listing AS ()
, sequence_listing AS ()
, master_listing AS (

SELECT 
	
	-- identifier fields
       
	FROM (
		SELECT (func).* FROM (
			SELECT fuction_generating_8500_records(...)	            
	        ) func
	        FROM scenario_info
	     ) call
	) master (function_column_rename)

)
 --# end "master_listing" CTE
SELECT 
	  scenariokey AS "Scenario Key"
	, invoicenumber AS "Invoice Number"
	, COALESCE(has_sequence,FALSE) AS "Has Sequence"
    	, COALESCE(has_detail,FALSE) AS "Has Detail"
    	, COALESCE(has_source,FALSE) AS "Has Source"
	/* Omit a CASE WHEN + COALESCE That Provides Names To Each of the Possible Combinations */
	
FROM master_listing 
 LEFT JOIN source_listing USING ()
 LEFT JOIN detail_listing USING ()
 LEFT JOIN sequence_listing USING ()
;

QUERY PLAN
Nested Loop Left Join  (cost=57926.57..58810.79 rows=1 width=67) (actual time=842.368..263801.626 rows=8656 loops=1)
  CTE scenario_info  {# Support CTE #}
    ->  Index Scan [...] (cost=0.25..8.52 rows=1 width=65) (actual time=44.855..44.859 rows=1 loops=1)
  
  CTE source_listing {# Sub 1 #}
    ->  Nested Loop  (cost=120.26..12219.68 rows=3594 width=21) (actual time=1.831..9.624 rows=2091 loops=1)
          Buffers: shared hit=408
          ->  CTE Scan [...]  (cost=0.00..0.02 rows=1 width=62) (actual time=0.001..0.006 rows=1 loops=1)
          ->  Bitmap Heap Scan [...]  (cost=120.26..12174.74 rows=3594 width=21) (actual time=1.813..6.711 rows=2091 loops=1)
                ->  Bitmap Index Scan [...] (cost=0.00..119.36 rows=3594 width=0) (actual time=1.604..1.604 rows=2091 loops=1)
  CTE detail_listing {# Sub 2 #}
    ->  HashAggregate  (cost=3046.18..3167.02 rows=12084 width=41) (actual time=129.044..132.509 rows=6042 loops=1)
          ->  Seq Scan on [...]  (cost=0.00..2872.53 rows=23153 width=41) (actual time=0.004..60.430 rows=23153 loops=1)
              
  CTE sequence_listing {# Sub 3 #}
    ->  Bitmap Heap Scan [...] (cost=623.58..19768.42 rows=9585 width=36) (actual time=64.801..77.267 rows=8656 loops=1)
          ->  Bitmap Index Scan [...] (cost=0.00..621.18 rows=9585 width=0) (actual time=64.635..64.635 rows=8656 loops=1)
  
  CTE master_listing {# The LEFT side of the multi-joins #}
    ->  Subquery Scan on call  (cost=22762.65..22762.94 rows=1 width=32) (actual time=619.158..735.559 rows=8656 loops=1)
          ->  CTE Scan on scenario_info  (cost=22762.65..22762.93 rows=1 width=196) (actual time=619.150..658.142 rows=8656 loops=1)
                 InitPlan 5 (returns $4)
                  ->  HashAggregate  (cost=22752.62..22762.65 rows=1003 width=7) (actual time=219.015..219.172 rows=796 loops=1)
                        
                        ->  Append  (cost=19790.25..22750.11 rows=1003 width=7) (actual time=98.569..217.774 rows=1510 loops=1)
                        
                              ->  HashAggregate  (cost=19790.25..19790.47 rows=22 width=6) (actual time=98.567..99.428 rows=796 loops=1)
                        
                                    ->  Bitmap Heap Scan on [...]  (cost=621.20..19790.01 rows=96 width=6) (actual time=49.550..96.681 rows=796 loops=1)
                                          Recheck Cond: [...]
                                          Filter: [...]
                        
                                          ->  Bitmap Index Scan on [...]  (cost=0.00..621.18 rows=9585 width=0) (actual time=49.350..49.350 rows=8656 loops=1)
                                                
                                                
                              ->  HashAggregate  (cost=2939.80..2949.61 rows=981 width=7) (actual time=117.373..117.543 rows=714 loops=1)
        
                                    ->  Seq Scan on [...]  (cost=0.00..2930.41 rows=3756 width=7) (actual time=0.582..115.888 rows=3429 loops=1)
                                          
                                          
 { # The CTE Queries Return Quite Quickly #}
 { # Now We Combine Them #}
 
 ->  Nested Loop Left Join  (cost=0.00..461.28 rows=1 width=66) (actual time=700.682..169850.684 rows=8656 loops=1)
        Join Filter: [...]
        
        ->  Nested Loop Left Join  (cost=0.00..125.81 rows=1 width=65) (actual time=635.866..33832.263 rows=8656 loops=1)
              Join Filter: [...]
              
              ->  CTE Scan on master_listing  (cost=0.00..0.02 rows=1 width=96) (actual time=619.163..798.343 rows=8656 loops=1)
        
              ->  CTE Scan on source_listing  (cost=0.00..71.88 rows=3594 width=101) (actual time=0.001..1.335 rows=2091 loops=8656)
                    
        ->  CTE Scan on sequence_listing  (cost=0.00..191.70 rows=9585 width=143) (actual time=0.008..5.505 rows=8656 loops=8656)
              
  ->  CTE Scan on detail_listing  (cost=0.00..241.68 rows=12084 width=143) (actual time=0.019..3.790 rows=6042 loops=8656)
        
Total runtime: 263811.946 ms




# For this plan instead of using CTE the query inside the CTEs was moved inside the LEFT JOIN
# I.E. LEFT JOIN ( <CTE QUERY> ) sub_n USING (...)
WITH 
scenario_info AS ()
, master_listing AS (
	SELECT 
		  scenariokey
		, invoicenumber
        , COALESCE(has_sequence, FALSE) AS has_sequence
        , COALESCE(has_detail, FALSE) AS has_detail
        , COALESCE(has_source, FALSE) AS has_source
       
	FROM (
		SELECT (func).* FROM (
			SELECT mage_create_sequencemaster(...)
	        ) func
	        FROM scenario_info
	     ) call
	) master (...)
	LEFT JOIN () seq USING (...)
	LEFT JOIN () detail USING (...)
	LEFT JOIN () src USING (...)
	
	ORDER BY invoicesequence  --# Didn't Bother Ordering the other query...
)

SELECT [...] FROM master_listing

;





QUERY PLAN
CTE Scan on master_listing  (cost=26258.77..26258.79 rows=1 width=67) (actual time=541681.104..541690.533 rows=8656 loops=1)
  CTE scenario_info
    ->  Index Scan using magescenario_pkey on magescenario  (cost=0.25..8.52 rows=1 width=65) (actual time=40.845..40.852 rows=1 loops=1)
          
  CTE master_listing
    ->  Sort  (cost=26250.24..26250.25 rows=1 width=35) (actual time=541681.097..541682.659 rows=8656 loops=1)
          Sort Method:  quicksort  Memory: 1111kB
          ->  Nested Loop Left Join  (cost=25809.07..26250.23 rows=1 width=35) (actual time=860.953..541541.541 rows=8656 loops=1)
                ->  Nested Loop Left Join  (cost=25809.07..26241.73 rows=1 width=34) (actual time=860.769..540748.350 rows=8656 loops=1)
                      ->  Nested Loop Left Join  (cost=22762.90..22772.61 rows=1 width=33) (actual time=758.079..415662.947 rows=8656 loops=1)
                            ->  CTE Scan on scenario_info  (cost=22762.65..22762.93 rows=1 width=196) (actual time=715.729..816.423 rows=8656 loops=1)
                                  InitPlan 2 (returns $1)
                                    ->  HashAggregate  (cost=22752.62..22762.65 rows=1003 width=7) (actual time=243.929..244.647 rows=796 loops=1)
                                          ->  Append  (cost=19790.25..22750.11 rows=1003 width=7) (actual time=117.295..242.829 rows=1510 loops=1)
                                                ->  HashAggregate  (cost=19790.25..19790.47 rows=22 width=6) (actual time=117.294..117.476 rows=796 loops=1)
                                                      ->  Bitmap Heap Scan on [...]  (cost=621.20..19790.01 rows=96 width=6) (actual time=83.059..116.925 rows=796 loops=1)
                                                            ->  Bitmap Index Scan [...]  (cost=0.00..621.18 rows=9585 width=0) (actual time=82.989..82.989 rows=8656 loops=1)                                                             
                                                ->  HashAggregate  (cost=2939.80..2949.61 rows=981 width=7) (actual time=123.224..124.175 rows=714 loops=1)
                                                      ->  Seq Scan on [...]  (cost=0.00..2930.41 rows=3756 width=7) (actual time=0.171..120.353 rows=3429 loops=1)
                            ->  Index Scan [...]  (cost=0.25..9.66 rows=1 width=37) (actual time=0.099..0.110 rows=1 loops=8656)
                      ->  HashAggregate  (cost=3046.18..3167.02 rows=12084 width=41) (actual time=0.013..6.093 rows=6042 loops=8656)
                            ->  Seq Scan on [...]  (cost=0.00..2872.53 rows=23153 width=41) (actual time=0.005..48.263 rows=23153 loops=1)
                ->  Index Scan using [...] (cost=0.00..8.49 rows=1 width=22) (actual time=0.065..0.066 rows=0 loops=8656)                      
Total runtime: 541694.052 ms


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux