Hello,
We have pretty big production database (running PostgreSQL 8.3.1) with
many partitioned tables. In most cases, they work well (since 8.2.1 at
least) -- constraint exclusion is able to select correct partitions.
However, there is an exception: queries on partitioned tables using
PostgreSQL 'UPDATE Foo ... FROM Bar' syntax extension.
Here is a simple test case:
------------------
CREATE TABLE bar ( bar_id INT NOT NULL PRIMARY KEY );
INSERT INTO bar VALUES ( 1 ), ( 2 ), ( 3 );
CREATE TABLE foo (
part INT NOT NULL
,foo_data INT
,bar_id INT NOT NULL REFERENCES bar( bar_id )
);
CREATE TABLE foo_1 ( CHECK ( part = 1 ) ) INHERITS ( foo );
INSERT INTO foo_1 ( part, bar_id ) VALUES ( 1, 1 ), ( 1, 3 );
CREATE TABLE foo_2 ( CHECK ( part = 2 ) ) INHERITS ( foo );
INSERT INTO foo_2 ( part, bar_id ) VALUES ( 2, 2 ), ( 2, 3 );
CREATE TABLE foo_3 ( CHECK ( part = 3 ) ) INHERITS ( foo );
INSERT INTO foo_3 ( part, bar_id ) VALUES ( 3, 1 ), ( 3, 2 );
------------------
As you can see, table "Foo" is partitioned by column "part". If only
"Foo" is referenced in update, query plan is fine:
=> EXPLAIN UPDATE foo SET foo_data = 10 WHERE part = 2;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..68.50 rows=20 width=14)
-> Seq Scan on foo (cost=0.00..34.25 rows=10 width=14)
Filter: (part = 2)
-> Seq Scan on foo_2 foo (cost=0.00..34.25 rows=10 width=14)
Filter: (part = 2)
(5 rows)
However, for this query it is far from being optimal:
=> EXPLAIN UPDATE foo SET foo_data = 10 FROM bar WHERE part = 2 AND
foo.bar_id = bar.bar_id;
QUERY PLAN
--------------------------------------------------------------------------------
Append (cost=0.00..nan rows=22 width=14)
-> Nested Loop (cost=0.00..73.05 rows=10 width=14)
-> Seq Scan on foo (cost=0.00..34.25 rows=10 width=14)
Filter: (part = 2)
-> Index Scan using bar_pkey on bar (cost=0.00..3.87 rows=1
width=4)
Index Cond: (bar.bar_id = public.foo.bar_id)
-> Merge Join (cost=nan..nan rows=1 width=8)
Merge Cond: (public.foo.bar_id = bar.bar_id)
-> Sort (cost=0.02..0.03 rows=1 width=0)
Sort Key: public.foo.bar_id
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: bar.bar_id
-> Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4)
-> Nested Loop (cost=0.00..73.05 rows=10 width=14)
-> Seq Scan on foo_2 foo (cost=0.00..34.25 rows=10 width=14)
Filter: (part = 2)
-> Index Scan using bar_pkey on bar (cost=0.00..3.87 rows=1
width=4)
Index Cond: (bar.bar_id = public.foo.bar_id)
-> Merge Join (cost=nan..nan rows=1 width=8)
Merge Cond: (public.foo.bar_id = bar.bar_id)
-> Sort (cost=0.02..0.03 rows=1 width=0)
Sort Key: public.foo.bar_id
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: bar.bar_id
-> Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4)
(29 rows)
Is there any way to avoid this anomaly? UPDATE ... FROM is very
convenient if you have to update rows depending on conditions involving
multiple tables. In addition, with partitioned tables,
standard-conforming UPDATE foo ... WHERE pk IN (SELECT pk FROM foo,
bar...) is even worse because query planner cannot choose correct
partitions without nested select results and so it does a complete scan
of all partitions instead.
Alex
-
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general