Re: View with an outer join - is there any way to optimise

John McCawley <nospam@xxxxxxxxxxxx> · Mon, 12 Dec 2005 10:42:35 -0600

First of all, check out this thread:

http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php

I had a similar performance issue with a view.  Look at my solution and 
it might help you out.

Second, you might want to change your token history status from a string 
to an integer that references a status table.  If your view is causing a 
sequential scan, you're going to end up will bazillions of string 
comparisons.  I don't know if Postgres has some form of black magic 
string comparison optimization, but I generally avoid string comparisons 
when I am dealing with a few known values, as would be the case in a 
status table.

Rich Doughty wrote:

I have a view vw_tokens defined as

  CREATE VIEW tokens.vw_tokens AS SELECT
      -- too many columns to mention
  FROM
      tokens.ta_tokens      t LEFT JOIN
      tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
                                     i.status   = 'issued'   LEFT JOIN
      tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
                                     s.status   = 'sold'     LEFT JOIN
      tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
                                   r.status   = 'redeemed'
  ;

the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

  SELECT *
  FROM
      tokens.ta_tokenhist h INNER JOIN
      tokens.vw_tokens    t ON h.token_id = t.token_id
  WHERE
      h.sarreport_id = 9
  ;

PG forms the full output of the view. the query plan is

 Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
         Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
         ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
               Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
               ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
                     Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
                     ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
                     ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
                           ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
                                 Filter: ((status)::text = 
'issued'::text)
               ->  Hash  (cost=459239.41..459239.41 rows=3905186 
width=152)
                     ->  Seq Scan on ta_tokenhist s  
(cost=0.00..459239.41 rows=3905186 width=152)
                           Filter: ((status)::text = 'sold'::text)
         ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
               ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)
                     Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=236)
         ->  Index Scan using fkx_tokenhist__sarreports on 
ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
               Index Cond: ((sarreport_id)::integer = 9)

I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:

  EXPLAIN
  SELECT *
  FROM
      tokens.vw_tokens__user
  WHERE
    token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE 
sarreport_id = 9);

                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------- 

 Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 
width=470)
         Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
         ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 
width=322)
               Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
               ->  Hash Left Join  (cost=559931.55..1093783.71 
rows=4052907 width=174)
                     Hash Cond: (("outer".token_id)::integer = 
("inner".token_id)::integer)
                     ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 
rows=4052907 width=26)
                     ->  Hash  (cost=459239.41..459239.41 rows=4114456 
width=152)
                           ->  Seq Scan on ta_tokenhist i  
(cost=0.00..459239.41 rows=4114456 width=152)
                                 Filter: ((status)::text = 
'issued'::text)
               ->  Hash  (cost=459239.41..459239.41 rows=3905186 
width=152)
                     ->  Seq Scan on ta_tokenhist s  
(cost=0.00..459239.41 rows=3905186 width=152)
                           Filter: ((status)::text = 'sold'::text)
         ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
               ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 
rows=2617645 width=152)
                     Filter: ((status)::text = 'redeemed'::text)
   ->  Hash  (cost=6.01..6.01 rows=1 width=4)
         ->  Index Scan using fkx_tokenhist__sarreports on 
ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
               Index Cond: ((sarreport_id)::integer = 9)

Setting enable_mergejoin and enable_hashjoin to off results in a nested
but still forms the view output.

I can achieve the results i need be eliminating the view and writing
the query manually but for various reasons i'd prefer to query a
view.

Any advice is greatly appreciated. i'm starting to wonder if the using
a view in this instance is futile.

Many thanks