2011/5/13 Denis de Bernardy <ddebernardy@xxxxxxxxx>: > I might have misread, but: > >> select * from connections where locked_by <> 4711 >> union all >> select * from connections_locked where locked_by = 4711; > > > The first part will result in a seq scan irrespective of indexes, and the second has no index on locked_by. The best you can do is to eliminate the seq scan on the second by adding the missing index on locked_by. just rework the primary key to set the locked_id first should work. > > That said, note that index usage depends on your data distribution: postgres may identify that it'll read most/all of the table anyway, and opt to do a (cheaper) seq scan instead. Fredrick, What indexes Oracle did choose ? (index-only scan ?) > > D > > > ----- Original Message ----- >> From: Fredrik Widlert <fredrik.widlert@xxxxxxxxx> >> To: pgsql-performance@xxxxxxxxxxxxxx >> Cc: >> Sent: Friday, May 13, 2011 1:55 PM >> Subject: How to avoid seq scans for joins between union-all views (test case included) >> >> Hi everyone >> >> We have recently started to port an application from Oracle to PostgreSQL. >> So far, we are amazed with how great most things work. >> >> However, we have run into performance problems in one type of query which >> is quite common in our application. We have created a (simplified) >> reproducible test case which (hopefully!) creates all necessary tables >> and data to >> show the problem. >> >> Plain-text description of the data model in the test case: >> >> We have a set of objects (like electrical cables), each having >> two nodes in the table "connections" (think of these two rows together >> as an edge in a graph). >> >> Another table "connections_locked" contains rows for some of >> the same objects, which are locked by a long transaction. >> >> The view connections_v performs a union all of the rows from >> "connections" which are not modified in the current long >> transaction with the rows from "connections_locked" which >> are modified in the current long transaction. >> >> Goal: >> Given an object id, we want to find all neighbors for this >> object (that is, objects which share a node with this object). >> >> Problem: >> We think that our query used to find neighbors would benefit >> greatly from using some of our indexes, but we fail to make it >> do so. >> >> >> Over to the actual test case: >> >> ---------------------------------------------- >> >> -- Tested on (from select version ()): >> -- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) >> 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit >> -- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) >> 4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit >> >> -- Ubuntu 11.04, uname -a output: >> -- Linux <hostname> 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 >> 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux >> -- Processor: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz >> -- Drive: Intel X25-M SSD >> >> >> drop table if exists connections cascade; >> drop table if exists connections_locked cascade; >> >> >> create table connections ( >> con_id serial primary key, >> locked_by integer not null, >> obj_id integer not null, >> node integer not null >> ); >> >> >> -- create test nodes, two per obj_id >> insert into connections (locked_by, obj_id, node) >> select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,500000) as n; >> >> create index connections_node_idx on connections (node); >> create index connections_obj_idx on connections (obj_id); >> vacuum analyze connections; >> >> >> >> create table connections_locked ( >> con_id integer not null, >> locked_by integer not null, >> obj_id integer not null, >> node integer not null, >> constraint locked_pk primary key (con_id, locked_by) >> ); >> >> -- mark a few of the objects as locked by a long transaction >> insert into connections_locked (con_id, locked_by, obj_id, node) >> select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n; >> >> create index connections_locked_node_idx on connections_locked (node); >> create index connections_locked_obj_idx on connections_locked (obj_id); >> vacuum analyze connections_locked; >> >> >> -- Create a view showing the world as seen by long transaction 4711. >> -- In real life, this uses a session variable instead of a hard-coded value. >> create or replace view connections_v as >> select * from connections where locked_by <> 4711 >> union all >> select * from connections_locked where locked_by = 4711; >> >> >> -- This is the query we are trying to optimize. >> -- We expect this to be able to use our indexes, but instead get >> sequential scans >> explain analyze >> select >> con2.obj_id >> from >> connections_v con1, >> connections_v con2 >> where >> con1.obj_id = 17 and >> con2.node = con1.node >> ; >> >> >> -- Output: >> -- Hash Join (cost=16.69..16368.89 rows=7501 width=4) (actual >> time=0.096..778.830 rows=4 loops=1) >> -- Hash Cond: ("*SELECT* 1".node = "*SELECT* 1".node) >> -- -> Append (cost=0.00..14402.00 rows=500050 width=8) (actual >> time=0.011..640.163 rows=500000 loops=1) >> -- -> Subquery Scan on "*SELECT* 1" (cost=0.00..13953.00 >> rows=500000 width=8) (actual time=0.011..430.645 rows=500000 loops=1) >> -- -> Seq Scan on connections (cost=0.00..8953.00 >> rows=500000 width=16) (actual time=0.009..178.535 rows=500000 loops=1) >> -- Filter: (locked_by <> 4711) >> -- -> Subquery Scan on "*SELECT* 2" (cost=0.00..449.00 >> rows=50 width=8) (actual time=3.254..3.254 rows=0 loops=1) >> -- -> Seq Scan on connections_locked >> (cost=0.00..448.50 rows=50 width=16) (actual time=3.253..3.253 rows=0 >> loops=1) >> -- Filter: (locked_by = 4711) >> -- -> Hash (cost=16.66..16.66 rows=3 width=4) (actual >> time=0.028..0.028 rows=2 loops=1) >> -- Buckets: 1024 Batches: 1 Memory Usage: 1kB >> -- -> Append (cost=0.00..16.66 rows=3 width=4) (actual >> time=0.013..0.025 rows=2 loops=1) >> -- -> Subquery Scan on "*SELECT* 1" >> (cost=0.00..8.35 >> rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1) >> -- -> Index Scan using connections_obj_idx on >> connections (cost=0.00..8.33 rows=2 width=16) (actual >> time=0.012..0.014 rows=2 loops=1) >> -- Index Cond: (obj_id = 17) >> -- Filter: (locked_by <> 4711) >> -- -> Subquery Scan on "*SELECT* 2" >> (cost=0.00..8.30 >> rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1) >> -- -> Index Scan using connections_locked_obj_idx >> on connections_locked (cost=0.00..8.29 rows=1 width=16) (actual >> time=0.007..0.007 rows=0 loops=1) >> -- Index Cond: (obj_id = 17) >> -- Filter: (locked_by = 4711) >> >> >> >> -- Rewriting the query to an almost-equivalent form yields almost the >> same result (that is, seq scans) >> explain analyze >> select >> con2.obj_id >> from >> connections_v con2 >> where con2.node in (select node from connections_v con1 where >> con1.obj_id = 17); >> >> >> -- Simplifying the query even more to use a sub-select with a >> hard-coded value still results in seq scans >> explain analyze >> select >> con2.obj_id >> from >> connections_v con2 >> where con2.node in (select 1015); >> >> >> -- Finally, when we simplify even more and just use a constant, we get >> the index accesses we were hoping >> -- for all along. >> explain analyze >> select >> con2.obj_id >> from >> connections_v con2 >> where con2.node in (1015); >> >> -- Result (cost=0.00..16.66 rows=3 width=4) (actual time=0.048..0.079 >> rows=2 loops=1) >> -- -> Append (cost=0.00..16.66 rows=3 width=4) (actual >> time=0.047..0.076 rows=2 loops=1) >> -- -> Subquery Scan on "*SELECT* 1" (cost=0.00..8.35 >> rows=2 >> width=4) (actual time=0.046..0.049 rows=2 loops=1) >> -- -> Index Scan using connections_node_idx on >> connections (cost=0.00..8.33 rows=2 width=16) (actual >> time=0.046..0.048 rows=2 loops=1) >> -- Index Cond: (node = 1015) >> -- Filter: (locked_by <> 4711) >> -- -> Subquery Scan on "*SELECT* 2" (cost=0.00..8.30 >> rows=1 >> width=4) (actual time=0.025..0.025 rows=0 loops=1) >> -- -> Index Scan using connections_locked_node_idx on >> connections_locked (cost=0.00..8.29 rows=1 width=16) (actual >> time=0.024..0.024 rows=0 loops=1) >> -- Index Cond: (node = 1015) >> -- Filter: (locked_by = 4711) >> >> >> >> ------- end of test case ----- >> >> Can someone explain what is happening here? Is there some way we can >> rewrite our query or some setting we could turn on or off to get the >> optimizer to choose to use our indexes? >> >> (testing with "set enable_seqscan = false;" does not make a difference >> as far as we can see) >> >> To verify that we have really created all necessary indexes, we have >> converted this simplified test case to Oracle syntax and tested it on >> our Oracle server. In this case, we do get the expected index accesses, >> so we think that we have in fact managed to isolate the problem using >> this test case. >> >> What we are hoping for: >> Since we have lots of queries joining these kind of "union all"-views >> between a master table and a transaction table, we would be really >> glad to hear something like "when you use these kinds of views, you >> need to do X, Y and Z to get good performance". >> >> Thanks in advance for any help! >> /Fredrik >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance