seqscan on UNION'ed views

Dmitry Karasik <dmitry@xxxxxxxxxxxxxx> · Wed, 27 Feb 2013 11:03:52 +0100

Dear all,

I have a problem with seqscan I hope you might help me with.
Attached is the simple script that reproduces a database and results, which I 
have tested both on 9.0.4 and 9.3-devel with identical results.

I need to have a sort of a time machine, where select statements on tables
could be easily replaced to select statements on tables as they were some time in the past, 
including all related table. To do so, I used views (see in the script) that UNION
both current and archive tables and filter them by a timestamp.

The problem arises when there are two such views used in a JOIN, and apparently
the query planner doesn't look deep enough into views, creating a very slow
seqscan-based plan. The setup here demonstrates how a join that needs to
extract a single row, includes a seqscan on the whole table (see 1.Bad plan in
explain.txt, and 1000 of rows are being scanned.  For the test purposes 1000
rows is not a high number, but on my system this is several millions, and that
takes significant time.

If I rewrite the query into what I would expect the planner would do for me
(see 2.Good plan), then (expectably) there are no seqscans. But I'm using an ORM
which can't rewrite joins in such a way automatically, and there are so many of
those automated queries that rewriting them by hand is also a rather bad
alternative.  So my question is, is it possible to somehow nudge the planner
into the right direction?

Thank you in advance!

-- 
Sincerely,
	Dmitry Karasik

1: Bad plan:
======================

testjoin=> EXPLAIN ANALYZE SELECT id FROM a JOIN b ON a.id = b.a_id WHERE b.id = 1;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=122.29..151.29 rows=12 width=12) (actual time=4.854..4.974 rows=1 loops=1)
   Hash Cond: (archive_a.id = archive_b.a_id)
   ->  HashAggregate  (cost=73.05..85.21 rows=1216 width=4) (actual time=3.399..4.139 rows=1000 loops=1)
         ->  Append  (cost=0.00..70.01 rows=1216 width=4) (actual time=0.052..2.164 rows=1000 loops=1)
               ->  Seq Scan on archive_a  (cost=0.00..39.10 rows=216 width=4) (actual time=0.019..0.019 rows=0 loops=1)
                     Filter: ((start_time <= 1000) AND (end_time > 1000))
                     Rows Removed by Filter: 1
               ->  Seq Scan on table_a  (cost=0.00..18.75 rows=1000 width=4) (actual time=0.032..1.100 rows=1000 loops=1)
                                                                                                            ^^^^
                                                                                                            ||||
                     Filter: (updated <= 1000)
                     Rows Removed by Filter: 100
   ->  Hash  (cost=49.21..49.21 rows=2 width=8) (actual time=0.172..0.172 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Unique  (cost=49.18..49.19 rows=2 width=8) (actual time=0.165..0.168 rows=1 loops=1)
               ->  Sort  (cost=49.18..49.18 rows=2 width=8) (actual time=0.162..0.163 rows=1 loops=1)
                     Sort Key: archive_b.id, archive_b.a_id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Append  (cost=0.00..49.17 rows=2 width=8) (actual time=0.067..0.070 rows=1 loops=1)
                           ->  Seq Scan on archive_b  (cost=0.00..40.98 rows=1 width=8) (actual time=0.041..0.041 rows=0 loops=1)
                                 Filter: ((start_time <= 1000) AND (end_time > 1000) AND (id = 1))
                                 Rows Removed by Filter: 1
                           ->  Index Scan using table_b_pkey on table_b  (cost=0.15..8.17 rows=1 width=8) (actual time=0.024..0.027 rows=1 loops=1)
                                 Index Cond: (id = 1)
                                 Filter: (updated <= 1000)
 Total runtime: 5.455 ms
(24 rows)

2: Good plan:
======================

EXPLAIN ANALYZE 
 SELECT a.id FROM table_a   a JOIN table_b   b ON a.id = b.a_id WHERE b.id = 1 AND a.updated <= 1000 AND b.updated <= 1000
UNION
 SELECT a.id FROM table_a   a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 AND a.updated <= 1000 AND b.start_time <= 1000 AND b.end_time > 1000
UNION
 SELECT a.id FROM archive_a a JOIN table_b   b ON a.id = b.a_id WHERE b.id = 1 AND a.start_time <= 1000 AND a.end_time > 1000 AND b.updated <= 1000
UNION
 SELECT a.id FROM archive_a a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 AND a.start_time <= 1000 AND a.end_time > 1000 AND b.start_time <= 1000 AND b.end_time > 1000
;

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=99.39..99.43 rows=4 width=4) (actual time=0.082..0.082 rows=1 loops=1)
   ->  Append  (cost=0.00..99.38 rows=4 width=4) (actual time=0.031..0.075 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..16.55 rows=1 width=4) (actual time=0.030..0.032 rows=1 loops=1)
               ->  Index Scan using table_b_pkey on table_b b  (cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.017 rows=1 loops=1)
                     Index Cond: (id = 1)
                     Filter: (updated <= 1000)
               ->  Index Scan using table_a_pkey on table_a a  (cost=0.00..8.27 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1)
                     Index Cond: (a.id = b.a_id)
                     Filter: (a.updated <= 1000)
         ->  Nested Loop  (cost=0.00..16.56 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1)
               ->  Index Scan using ab_idx on archive_b b  (cost=0.00..8.27 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1)
                     Index Cond: ((start_time <= 1000) AND (end_time > 1000))
                     Filter: (id = 1)
               ->  Index Scan using table_a_pkey on table_a a  (cost=0.00..8.27 rows=1 width=4) (never executed)
                     Index Cond: (a.id = b.a_id)
                     Filter: (a.updated <= 1000)
         ->  Hash Join  (cost=19.06..33.12 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=1)
               Hash Cond: (a.id = b.a_id)
               ->  Bitmap Heap Scan on archive_a a  (cost=10.77..24.01 rows=216 width=4) (actual time=0.004..0.004 rows=0 loops=1)
                     Recheck Cond: ((start_time <= 1000) AND (end_time > 1000))
                     ->  Bitmap Index Scan on aa_idx  (cost=0.00..10.72 rows=216 width=0) (actual time=0.002..0.002 rows=0 loops=1)
                           Index Cond: ((start_time <= 1000) AND (end_time > 1000))
               ->  Hash  (cost=8.27..8.27 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Index Scan using table_b_pkey on table_b b  (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                           Index Cond: (id = 1)
                           Filter: (updated <= 1000)
         ->  Hash Join  (cost=19.06..33.12 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=1)
               Hash Cond: (a.id = b.a_id)
               ->  Bitmap Heap Scan on archive_a a  (cost=10.77..24.01 rows=216 width=4) (never executed)
                     Recheck Cond: ((start_time <= 1000) AND (end_time > 1000))
                     ->  Bitmap Index Scan on aa_idx  (cost=0.00..10.72 rows=216 width=0) (never executed)

use strict;
use warnings;

system "dropdb testjoin";
system "createdb testjoin";

open F, ">", "testjoin.sql";
print F <<MAIN;
CREATE TABLE table_a  ( updated    INTEGER,                   id SERIAL PRIMARY KEY              );
CREATE TABLE table_b  ( updated    INTEGER,                   id SERIAL PRIMARY KEY, a_id INTEGER);
CREATE TABLE archive_a( start_time INTEGER, end_time INTEGER, id INTEGER                         );
CREATE TABLE archive_b( start_time INTEGER, end_time INTEGER, id INTEGER,            a_id INTEGER);

CREATE INDEX aa_idx ON archive_a (start_time,end_time);
CREATE INDEX ab_idx ON archive_b (start_time,end_time);

CREATE VIEW a AS
	SELECT id FROM archive_a WHERE start_time <= 1000 AND end_time > 1000
	UNION 
	SELECT id FROM table_a WHERE updated <= 1000
	;

CREATE VIEW b AS
	SELECT id,a_id FROM archive_b WHERE start_time <= 1000 AND end_time > 1000
	UNION 
	SELECT id,a_id FROM table_b WHERE updated <= 1000
	;
MAIN

print F "INSERT INTO archive_a (start_time,end_time,id) VALUES (1,1,1200);\n";
print F "INSERT INTO archive_b (start_time,end_time,id,a_id) VALUES (1,1,1200,1200);\n";
print F "INSERT INTO table_a (id,updated) VALUES (DEFAULT,$_);\n" for 1 .. 1100;
print F "INSERT INTO table_b (id,updated,a_id) VALUES (DEFAULT,$_,$_);\n" for 1 .. 10, 1100;
print F "EXPLAIN ANALYZE SELECT a.id FROM a JOIN b ON a.id = b.a_id WHERE b.id = 1;\n";

system "psql testjoin < testjoin.sql";
-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance