JOIN to a VIEW makes a real slow query

"Chuck D." <pgsql-performance@xxxxxxxxxx> · Tue, 13 Feb 2007 12:53:55 -0600

Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to 
check here first.  I've seen this discussed on the list before but I'm still 
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,' 
and 'country' tables.  The city data comes from multiple sources and totals 
about 3 million rows.  I decided to split the city table up based on the 
source (world_city, us_city).  This makes easier updating because the 
assigned feature id's from the two sources overlap in some cases making it 
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The 
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About 
65 seconds to select a city.  It doesn't matter wether it is joined to one 
table or 6 like it is in my user_detail query - it is still slow.  It has 
indexes on the city_id, state_id, country_id of each table in the view too.  
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=650146.58..751018.45 rows=10618 width=55) 
(actual time=53078.261..61269.190 rows=1 loops=1)
   Join Filter: ("outer".city_id = "inner"."?column1?")
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.022 rows=1 loops=1)
   ->  Unique  (cost=650146.58..703236.51 rows=2123597 width=62) (actual 
time=49458.007..59635.140 rows=2122712 loops=1)
         ->  Sort  (cost=650146.58..655455.58 rows=2123597 width=62) (actual 
time=49458.003..55405.965 rows=2122712 loops=1)
               Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat, 
lon, city_name
               ->  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual 
time=18.835..13706.395 rows=2122712 loops=1)
                     ->  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
                     ->  Seq Scan on world_city  (cost=0.00..47632.88 
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
 Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=36) (actual 
time=53.854..53.871 rows=1 loops=1)
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.016 rows=1 loops=1)
   ->  Index Scan using world_city_pk on world_city ci  (cost=0.00..3.01 
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
         Index Cond: ("outer".city_id = ci.city_id)
 Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms

I'm not sure that a view on a UNION is the best idea but I don't know how to 
go about keeping the tables from the data sources with the view (other than 
modifying them with a source_id column).  Any ideas on what is causing the 
performance lag?