How to avoid seq scans for joins between union-all views (test case included)

Fredrik Widlert <fredrik.widlert@xxxxxxxxx> · Fri, 13 May 2011 13:55:46 +0200

Hi everyone

We have recently started to port an application from Oracle to PostgreSQL.
So far, we are amazed with how great most things work.

However, we have run into performance problems in one type of query which
is quite common in our application. We have created a (simplified)
reproducible test case which (hopefully!) creates all necessary tables
and data to
show the problem.

Plain-text description of the data model in the test case:

We have a set of objects (like electrical cables), each having
two nodes in the table "connections" (think of these two rows together
as an edge in a graph).

Another table "connections_locked" contains rows for some of
the same objects, which are locked by a long transaction.

The view connections_v performs a union all of the rows from
"connections" which are not modified in the current long
transaction with the rows from "connections_locked" which
are modified in the current long transaction.

Goal:
Given an object id, we want to find all neighbors for this
object (that is, objects which share a node with this object).

Problem:
We think that our query used to find neighbors would benefit
greatly from using some of our indexes, but we fail to make it
do so.

Over to the actual test case:

----------------------------------------------

-- Tested on (from select version ()):
-- PostgreSQL 9.0.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit
-- PostgreSQL 9.1beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 32-bit

-- Ubuntu 11.04, uname -a output:
-- Linux <hostname> 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11
05:17:09 UTC 2011 i686 i686 i386 GNU/Linux
-- Processor: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
-- Drive: Intel X25-M SSD

drop table if exists connections cascade;
drop table if exists connections_locked cascade;

create table connections (
  con_id serial primary key,
  locked_by integer not null,
  obj_id integer not null,
  node integer not null
);

-- create test nodes, two per obj_id
insert into connections (locked_by, obj_id, node)
select 0, n/2, 1000 + (n + 1)/2 from generate_series (1,500000) as n;

create index connections_node_idx on connections (node);
create index connections_obj_idx on connections (obj_id);
vacuum analyze connections;

create table connections_locked (
  con_id integer not null,
  locked_by integer not null,
  obj_id integer not null,
  node integer not null,
  constraint locked_pk primary key (con_id, locked_by)
);

-- mark a few of the objects as locked by a long transaction
insert into connections_locked (con_id, locked_by, obj_id, node)
select n, 1 + n/50, n/2, 1000 + (n + 1)/2 from generate_series (1,25000) as n;

create index connections_locked_node_idx on connections_locked (node);
create index connections_locked_obj_idx on connections_locked (obj_id);
vacuum analyze connections_locked;

-- Create a view showing the world as seen by long transaction 4711.
-- In real life, this uses a session variable instead of a hard-coded value.
create or replace view connections_v as
select * from connections where locked_by <> 4711
union all
select * from connections_locked where locked_by = 4711;

-- This is the query we are trying to optimize.
-- We expect this to be able to use our indexes, but instead get
sequential scans
explain analyze
select
     con2.obj_id
from
     connections_v con1,
     connections_v con2
where
     con1.obj_id = 17 and
     con2.node = con1.node
;

-- Output:
-- Hash Join  (cost=16.69..16368.89 rows=7501 width=4) (actual
time=0.096..778.830 rows=4 loops=1)
--   Hash Cond: ("*SELECT* 1".node = "*SELECT* 1".node)
--   ->  Append  (cost=0.00..14402.00 rows=500050 width=8) (actual
time=0.011..640.163 rows=500000 loops=1)
--         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..13953.00
rows=500000 width=8) (actual time=0.011..430.645 rows=500000 loops=1)
--               ->  Seq Scan on connections  (cost=0.00..8953.00
rows=500000 width=16) (actual time=0.009..178.535 rows=500000 loops=1)
--                     Filter: (locked_by <> 4711)
--         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..449.00
rows=50 width=8) (actual time=3.254..3.254 rows=0 loops=1)
--               ->  Seq Scan on connections_locked
(cost=0.00..448.50 rows=50 width=16) (actual time=3.253..3.253 rows=0
loops=1)
--                     Filter: (locked_by = 4711)
--   ->  Hash  (cost=16.66..16.66 rows=3 width=4) (actual
time=0.028..0.028 rows=2 loops=1)
--         Buckets: 1024  Batches: 1  Memory Usage: 1kB
--         ->  Append  (cost=0.00..16.66 rows=3 width=4) (actual
time=0.013..0.025 rows=2 loops=1)
--               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..8.35
rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1)
--                     ->  Index Scan using connections_obj_idx on
connections  (cost=0.00..8.33 rows=2 width=16) (actual
time=0.012..0.014 rows=2 loops=1)
--                           Index Cond: (obj_id = 17)
--                           Filter: (locked_by <> 4711)
--               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..8.30
rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)
--                     ->  Index Scan using connections_locked_obj_idx
on connections_locked  (cost=0.00..8.29 rows=1 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
--                           Index Cond: (obj_id = 17)
--                           Filter: (locked_by = 4711)

-- Rewriting the query to an almost-equivalent form yields almost the
same result (that is, seq scans)
explain analyze
select
    con2.obj_id
from
   connections_v con2
   where con2.node in (select node from connections_v con1 where
con1.obj_id = 17);

-- Simplifying the query even more to use a sub-select with a
hard-coded value still results in seq scans
explain analyze
select
    con2.obj_id
from
   connections_v con2
   where con2.node in (select 1015);

-- Finally, when we simplify even more and just use a constant, we get
the index accesses we were hoping
-- for all along.
explain analyze
select
    con2.obj_id
from
   connections_v con2
   where con2.node in (1015);

-- Result  (cost=0.00..16.66 rows=3 width=4) (actual time=0.048..0.079
rows=2 loops=1)
--   ->  Append  (cost=0.00..16.66 rows=3 width=4) (actual
time=0.047..0.076 rows=2 loops=1)
--         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..8.35 rows=2
width=4) (actual time=0.046..0.049 rows=2 loops=1)
--               ->  Index Scan using connections_node_idx on
connections  (cost=0.00..8.33 rows=2 width=16) (actual
time=0.046..0.048 rows=2 loops=1)
--                     Index Cond: (node = 1015)
--                     Filter: (locked_by <> 4711)
--         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..8.30 rows=1
width=4) (actual time=0.025..0.025 rows=0 loops=1)
--               ->  Index Scan using connections_locked_node_idx on
connections_locked  (cost=0.00..8.29 rows=1 width=16) (actual
time=0.024..0.024 rows=0 loops=1)
--                     Index Cond: (node = 1015)
--                     Filter: (locked_by = 4711)

------- end of test case -----

Can someone explain what is happening here? Is there some way we can
rewrite our query or some setting we could turn on or off to get the
optimizer to choose to use our indexes?

(testing with "set enable_seqscan = false;" does not make a difference
as far as we can see)

To verify that we have really created all necessary indexes, we have
converted this simplified test case to Oracle syntax and tested it on
our Oracle server. In this case, we do get the expected index accesses,
so we think that we have in fact managed to isolate the problem using
this test case.

What we are hoping for:
Since we have lots of queries joining these kind of "union all"-views
between a master table and a transaction table, we would be really
glad to hear something like "when you use these kinds of views, you
need to do X, Y and Z to get good performance".

Thanks in advance for any help!
/Fredrik

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance