regarding ROW comparisons

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Sun, 6 Mar 2011 11:32:27 -0600

I am working with a system which periodically has to perform this operation:

update all of the rows in table A which match rows in table B, using a
subset of the columns for comparison (and one of the columns in tableA
is NULL).

This is what I've tried:

update tableA
SET column1 = some_value
FROM tableB
WHERE
  ROW(tableA.column2, tableA.column4)
  IS NOT DISTINCT FROM
  ROW(tableB.column2, NULL)
AND
  (tableB.column3 = 1 OR tableB.column3 = 2)

I've also tried:

update tableA
SET column1 = some_value
FROM tableB
WHERE
  tableA.column2 = tableB.column2
  AND
  tableA.column4 IS NULL
  AND
  (tableB.column3 = 1 OR tableB.column3 = 2)

and:

update tableA
SET column1 = some_value
WHERE
  EXISTS (
    SELECT 1 FROM tableB
    WHERE
    tableA.column2 = tableB.column2
    AND
    (tableB.column3 = 1 OR tableB.column3 = 2)
  )
  AND
  tableA.column4 IS NULL

assuming column2 from either table will never be NULL, and
tableB.column3 will also never be NULL.
Furthermore, assume that tableA.column4 is usually NULL and that the
number of rows in tableB is typically only a small percentage of the
number of rows in tableA.

The costs associated with each if these look like this:
The basic join version:
 Hash Join  (cost=33441.00..54830.46 rows=302878 width=81)

The ROW version:
 Nested Loop  (cost=0.00..42872.28 rows=1 width=81)

The EXISTS version:
 Hash Semi Join  (cost=18739.28..50014.85 rows=302878 width=81)

The basic join version usually runs fast enough (say, 6-10 seconds).
The EXISTS version starts out faster (typically a bit faster than the
basic join version).
The ROW version never completes (going on 45 minutes now, but I've let
them go for up to 6 hours).

I can only assume I'm doing something wrong.
The column types are nothing special (INT, TEXT, sometimes INET).
There are indexes on tableA but not on table B.

I am using postgresql 8.4.5 and I have tried on both CentOS and
openSUSE with the same results.

-- 
Jon

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general