Hello, I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE
SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query: MERGE
INTO test_large d
USING
(SELECT ROWID, test_large.*
FROM test_large
WHERE grp
= 1) s
ON
(d.ROWID = s.ROWID) WHEN
MATCHED THEN
UPDATE
SET d.grp = s.grp; It has the following execution plan: --------------------------------------------------------------------------------------------- | Id
| Operation
| Name | Rows
| Bytes | Cost (%CPU)| Time
| --------------------------------------------------------------------------------------------- |
0 | MERGE STATEMENT
| |
1000 | 16000 | 2006
(0)| 00:00:25 | |
1 | MERGE
| TEST_LARGE |
| |
| | |
2 | VIEW
|
| |
| |
| |
3 | NESTED LOOPS
| |
1000 | 220K|
2006 (0)| 00:00:25 | |
4 | TABLE ACCESS BY INDEX ROWID| TEST_LARGE |
1000 | 110K|
1006 (0)| 00:00:13 | |*
5 | INDEX RANGE SCAN
| IX_TL_GRP |
1000 | |
6 (0)| 00:00:01 | |
6 | TABLE ACCESS BY USER ROWID | TEST_LARGE |
1 |
113 | 1
(0)| 00:00:01 | --------------------------------------------------------------------------------------------- In the PostgreSQL version I use the column ctid for this. The above query becomes: UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s WHERE d.ctid = s.ctid; I noticed that the execution plan for such a statement seems to be suboptimal: Update on test_large d (cost=155346.98..160367.08 rows=996 width=53) -> Merge Join (cost=155346.98..160367.08 rows=996 width=53) Merge Cond: (d.ctid = test_large.ctid) -> Sort (cost=154330.57..156833.16 rows=1001033 width=43) Sort Key: d.ctid -> Seq Scan on test_large d (cost=0.00..19366.33 rows=1001033 width=43) -> Sort (cost=1016.40..1018.89 rows=996 width=10) Sort Key: test_large.ctid -> Index Scan using ix_tl_grp on test_large (cost=0.42..966.80 rows=996 width=10) Index Cond: (grp = 1) I expected a tid scan to be used instead of a Seq scan. I did a VACUUM test_large to make sure statistics are valid. I would suspect that a tid scan is even a bit faster than accessing via the primary key, since reading the index is not required. I know I could change the WHERE clause to use the primary key, which results in the following plan: Update on test_large d (cost=0.85..3344.12 rows=995 width=57) -> Nested Loop (cost=0.85..3344.12 rows=995 width=57) -> Index Scan using ix_tl_grp on test_large (cost=0.42..994.84 rows=995 width=14) Index Cond: (grp = 1) -> Index Scan using pk_test_large on test_large d (cost=0.42..2.36 rows=1 width=47) Index Cond: (id = test_large.id) So my question is more "Is a tid scan not possible / useful for an update" than "how do I accelerate this query". BR Dirk
|