Forcing seq_scan off for large table joined with tiny table yeilds improved performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have two tables, like this:

Big table:

CREATE TABLE photo_info_data
(
  photo_id integer NOT NULL,
  field_name character varying NOT NULL,
  field_value character varying,
  CONSTRAINT photo_info_data_pk PRIMARY KEY (photo_id, field_name)
)
WITH (OIDS=FALSE);

CREATE INDEX user_info_data_ix_field_value
  ON user_info_data
  USING btree
  (field_value);


Small table:

CREATE TABLE t_query_data
(
  i integer,
  "key" character varying,
  op character varying,
  "value" character varying
)
WITH (OIDS=FALSE);

I have around 2400000 rows in photo_info_data, and just two rows in t_query_data:
 i |  key     | op | value
---+----------+----+--------
 1 | f-stop   | eq | 2.6
 2 | shutter  | gt | 1/100


This is the query I'm executing:

SELECT
	*
FROM
	photo_info_data u
	JOIN t_query_data t on u.field_name = key

This query takes around 900ms to execute. It returns 6 rows.

When I do 'explain analyze' for some reason it takes around 7 seconds, and this is what I get:

phototest=# explain analyze select * from photo_info_data u join t_query_data t on u.field_name = key; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58676.31 rows=218048 width=68) (actual time=2381.895..7087.225 rows=6 loops=1)
   Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47500.30 rows=2398530 width=50) (actual time=0.042..3454.112 rows=2398446 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=18) (actual time=0.016..0.016 rows=2 loops=1) -> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2 width=18) (actual time=0.003..0.007 rows=2 loops=1)
 Total runtime: 7087.291 ms
(6 rows)

Time: 7088.663 ms


I can rerun this query many times, it's always around 7 seconds. I/O wait during the query is nonexistant, it just takes 100% of CPU time (i have a DualCore Opteron server).

If I force the planner not to use sequential_scan, here is what I get:

phototest=# explain analyze select * from photo_info_data u join t_query_data t on u.field_name = key; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=100039134.84..100130206.79 rows=218048 width=68) (actual time=271.138..540.998 rows=6 loops=1) -> Seq Scan on t_query_data t (cost=100000000.00..100000001.02 rows=2 width=18) (actual time=0.008..0.015 rows=2 loops=1) -> Bitmap Heap Scan on photo_info_data u (cost=39134.84..63740.08 rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
         Recheck Cond: ((u.field_name)::text = (t.key)::text)
-> Bitmap Index Scan on photo_info_data_pk (cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435 rows=3 loops=2)
               Index Cond: ((u.field_name)::text = (t.key)::text)
 Total runtime: 541.065 ms
(7 rows)

Time: 542.147 ms


The database currently has only those two tables. I have vacuumed them prior running above queries.

I tought this information also might be important:
phototest=# select key, count(*) from photo_info_data u join t_query_data t on u.field_name = key group by key;
  key     | count
----------+-------
 f-stop   |     3
 shutter  |     3
(2 rows)


Am I doing something wrong here? The photo_info_data would hold around 10.000.000 records, should I be doing 'set seq_scan to false' each time I will want to run this query? (Since I'm accessing postgres trough JDBC I'll have same situation I had weeks ago, I described it here also).

	Mike

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux