Query on view radically slower than query on underlying table

Craig James <craig_james@xxxxxxxxxxxxxx> · Mon, 28 Feb 2011 10:28:44 -0800

We have a medium-sized catalog (about 5 million rows), but some of our customers only want to see portions of it.  I've been experimenting with a customer-specific schema that contains nothing but a "join table" -- just the primary keys of that portion of the data that each customer wants to see, which is used to create a view that looks like the original table. But the most important query, the one that customers use to scan page-by-page through search results, turns out to be far too slow (65 seconds versus 55 milliseconds).

Below are the results of two explain/analyze statements.  The first one uses the view, the second one goes directly to the original tables.  I thought this would be a slam-dunk, that it would return results in a flash because the view is created from two tables with the same primary keys.

My guess (and it's just a wild guess) is that the "left join" is forcing a sequence scan or something.  But we need the left join, because it's on a "hitlist" that recorded all the matches to a customer's earlier query, and if rows have been removed from the tables, the customer needs to see a blank row.

Here is the "bad" query, which is run on the view:

em=> explain analyze
select version.version_id, version.isosmiles
from hitlist_rows_reset_140
left join version on (hitlist_rows_reset_140.objectid = version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.sortorder;
                                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
------------------------------
 Nested Loop Left Join  (cost=23687.51..215315.74 rows=1 width=54) (actual time=2682.662..63680.076 rows=10 loops=1)
   Join Filter: (hitlist_rows_reset_140.objectid = v.version_id)
   ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8) (actual time=
0.015..0.049 rows=10 loops=1)
         Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
   ->  Hash Join  (cost=23687.51..204666.54 rows=851267 width=50) (actual time=31.829..6263.403 rows=851267 loops=10)
         Hash Cond: (v.version_id = mv.version_id)
         ->  Seq Scan on version v  (cost=0.00..116146.68 rows=5631968 width=50) (actual time=0.006..859.758 rows=5632191 loo
ps=10)
         ->  Hash  (cost=13046.67..13046.67 rows=851267 width=4) (actual time=317.488..317.488 rows=851267 loops=1)
               ->  Seq Scan on my_version mv  (cost=0.00..13046.67 rows=851267 width=4) (actual time=2.888..115.166 rows=8512
67 loops=1)
 Total runtime: 63680.162 ms

Here is the "good" query, which is run directly on the data tables.

em=> explain analyze
select registry.version.version_id, registry.version.isosmiles
from hitlist_rows_reset_140
left join registry.version on (hitlist_rows_reset_140.objectid = registry.version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.SortOrder;
                                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
------------------------------
 Nested Loop Left Join  (cost=0.00..17.73 rows=1 width=54) (actual time=36.022..55.558 rows=10 loops=1)
   ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8) (actual time=
0.021..0.025 rows=10 loops=1)
         Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
   ->  Index Scan using version_pkey on version  (cost=0.00..9.35 rows=1 width=50) (actual time=5.551..5.552 rows=1 loops=10)
         Index Cond: (hitlist_rows_reset_140.objectid = version.version_id)
 Total runtime: 55.608 ms
(6 rows)

The view is defined like this:

em=> \d my_version
Table "test_schema.my_version"
   Column   |  Type   | Modifiers
------------+---------+-----------
 version_id | integer | not null
Indexes:
    "my_version_pkey" PRIMARY KEY, btree (version_id)

em=> \d version
 View "test_schema.version"
   Column   |  Type   | Modifiers
------------+---------+-----------
 version_id | integer |
 parent_id  | integer |
 isosmiles  | text    |
 coord_2d   | text    |
View definition:
 SELECT v.version_id, v.parent_id, v.isosmiles, v.coord_2d
   FROM registry.version v
   JOIN my_version mv ON mv.version_id = v.version_id;

This is:

  Postgres 8.4.4
  Ubuntu Linux 2.6.32-27
  Database: 8x7200 RAID 10, LSI RAID controller with BBU
  WAL: 2x7200 RAID1

Non-default config parameters:

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
track_activities = on
track_counts = off
track_functions = none
escape_string_warning = off

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance