I have a database with two tables that relate similar data, and a view which projects and combines the data from these two tables in order to access them both in a consistent manner. With enough information, the application can specifically choose to query from one table or the other, but in the more general case the data could come from either table, so I need to query the view. When I join against the view (or an equivalent subselect), however, it looks like the joining condition is not pushed down into the individual components of the union that defines the view. This leads to a significant performance degradation when using the view; I ask the list for help in resolving this problem. The remainder of this email digs into this problem in detail. (If you were interested in background on this database, it implements a backing store for a higher level RDF database, specifically for the RDFLib project. I would be happy to talk more about this application, or the corresponding database design issues, with anyone who might be interested, in whatever forum would be appropriate.) I begin with the poorly performing query, which follows this paragraph. This query joins one of the tables to the view, and using 'explain' on this query gives the query plan listed below the query. Note that in this query plan, the join filter happens after (above) the collection of matching rows from each of the parts of the UNION. <query> select * from relations as component_0_statements cross join URI_or_literal_object as component_1_statements where component_0_statements.predicate = -2875059751320018987 and component_0_statements.object = -2827607394936393903 and component_1_statements.subject = component_0_statements.subject and component_1_statements.predicate = -2875059751320018987 </query> <query-plan> QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=96.31..36201.57 rows=1 width=128) Join Filter: (component_0_statements.subject = literalproperties.subject) -> Index Scan using relations_poscindex on relations component_0_statements (cost=0.00..9.96 rows=1 width=40) Index Cond: ((predicate = (-2875059751320018987)::bigint) AND (object = (-2827607394936393903)::bigint)) -> Append (cost=96.31..36044.62 rows=11759 width=88) -> Bitmap Heap Scan on literalproperties (cost=96.31..16190.72 rows=5052 width=49) Recheck Cond: (literalproperties.predicate = (-2875059751320018987)::bigint) -> Bitmap Index Scan on literalproperties_predicateindex (cost=0.00..95.04 rows=5052 width=0) Index Cond: (literalproperties.predicate = (-2875059751320018987)::bigint) -> Bitmap Heap Scan on relations (cost=128.99..19736.31 rows=6707 width=40) Recheck Cond: (relations.predicate = (-2875059751320018987)::bigint) -> Bitmap Index Scan on relations_predicateindex (cost=0.00..127.32 rows=6707 width=0) Index Cond: (relations.predicate = (-2875059751320018987)::bigint) (13 rows) </query-plan> As it turns out, all of the results are in fact from the 'relations' table, so we get the same results if we query that table instead of the more general view. The corresponding query follows this paragraph, and its query plan immediately follows it. Note that in this query plan, the join condition is pushed down to the leaf node as an Index Condition, which seems to be the main source of the dramatic performance difference. <query> select * from relations as component_0_statements cross join relations as component_1_statements where component_0_statements.predicate = -2875059751320018987 and component_0_statements.object = -2827607394936393903 and component_1_statements.subject = component_0_statements.subject and component_1_statements.predicate = -2875059751320018987 </query> <query-plan> QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..26.11 rows=1 width=80) -> Index Scan using relations_poscindex on relations component_0_statements (cost=0.00..9.96 rows=1 width=40) Index Cond: ((predicate = (-2875059751320018987)::bigint) AND (object = (-2827607394936393903)::bigint)) -> Index Scan using relations_subjectindex on relations component_1_statements (cost=0.00..16.13 rows=1 width=40) Index Cond: (component_1_statements.subject = component_0_statements.subject) Filter: (component_1_statements.predicate = (-2875059751320018987)::bigint) (6 rows) </query-plan> My research led me to a post by Tom Lane describing the conditions in which the WHERE conditions cannot be pushed down to the UNION parts: <http://archives.postgresql.org/pgsql-performance/2007-11/msg00041.php>. I refactored the UNION definition slightly to attempt to bring all the column types into alignment, as that seemed like it might be a blocker, but the problem persists. It didn't look like the other conditions would hold in my case, but I certainly could be wrong. For reference, the definitions of the two tables and the view are listed below. The 'literalproperties' tables has 8229098 rows, and the 'relations' table has 6960820 rows. # \d literalproperties Table "public.literalproperties" Column | Type | Modifiers ----------------+----------------------+----------- subject | bigint | not null subject_term | character(1) | not null predicate | bigint | not null predicate_term | character(1) | not null object | bigint | not null context | bigint | not null context_term | character(1) | not null data_type | bigint | language | character varying(3) | Indexes: "literalproperties_poscindex" UNIQUE, btree (predicate, object, subject, context, data_type, language) "literalproperties_context_termindex" btree (context_term) "literalproperties_contextindex" btree (context) "literalproperties_data_typeindex" btree (data_type) "literalproperties_languageindex" btree (language) "literalproperties_objectindex" btree (object) "literalproperties_predicate_termindex" btree (predicate_term) "literalproperties_predicateindex" btree (predicate) "literalproperties_subject_termindex" btree (subject_term) "literalproperties_subjectindex" btree (subject) # \d relations; Table "public.relations" Column | Type | Modifiers ----------------+--------------+----------- subject | bigint | not null subject_term | character(1) | not null predicate | bigint | not null predicate_term | character(1) | not null object | bigint | not null object_term | character(1) | not null context | bigint | not null context_term | character(1) | not null Indexes: "relations_poscindex" UNIQUE, btree (predicate, object, subject, context) "relations_context_termindex" btree (context_term) "relations_contextindex" btree (context) "relations_object_termindex" btree (object_term) "relations_objectindex" btree (object) "relations_predicate_termindex" btree (predicate_term) "relations_predicateindex" btree (predicate) "relations_subject_termindex" btree (subject_term) "relations_subjectindex" btree (subject) # \d uri_or_literal_object View "public.uri_or_literal_object" Column | Type | Modifiers ----------------+----------------------+----------- subject | bigint | subject_term | character(1) | predicate | bigint | predicate_term | character(1) | object | bigint | object_term | character(1) | context | bigint | context_term | character(1) | data_type | bigint | language | character varying(3) | View definition: SELECT literalproperties.subject, literalproperties.subject_term, literalproperties.predicate, literalproperties.predicate_term, literalproperties.object, 'L'::character(1) AS object_term, literalproperties.context, literalproperties.context_term, literalproperties.data_type, literalproperties.language FROM literalproperties UNION ALL SELECT relations.subject, relations.subject_term, relations.predicate, relations.predicate_term, relations.object, relations.object_term, relations.context, relations.context_term, NULL::bigint AS data_type, NULL::character varying(3) AS language FROM relations; Does anyone have any ideas about how I could better optimize joins against a union (either with a view or a subquery) like this? Thanks, and take care, John L. Clark -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance