Re: Network Flow Schema + Bulk Import/Updates

Tony Wasson <ajwasson@xxxxxxxxx> · Wed, 21 Sep 2005 12:04:20 -0700

On 9/20/05, Michael L. Artz <dragon@xxxxxxxxxxxxx> wrote:
> I'm fairly new at Postgres and had some basic design questions.  My
> problem is basically that I want to do large bulk imports (millions of
> rows) into a large DB (billions of rows) where there might already be
> data that needs to be updated instead of inserting a new row.  I read a
> similar post a few days ago, but I just had a couple more questions.

You can use the merge trigger below to do this. You'll need to add
some code to update the count. You may also benefit from using the new
constraint exclusion (table partitioning) in PostgreSQL 8.1. I am not
sure if CE works against the inet datatype -- if not, try converting
the IP to an integer.

This merge function is from an earlier version by Mike Rylander I got
from here:
http://archives.postgresql.org/pgsql-sql/2004-05/msg00135.php

His version worked fine, all mistakes are my own. I wanted to allow
multiple key columns.

--
-- Merge on INSERT functionallity for Postgres 8.0+
--
-- Original Author: miker ( at ) purplefrog ( dot ) com / 5-14-04
-- ajwasson (at) gmail (dot) cot -- Added support for multiple key
columns 8-20-05
--
-- CAVEAT EMPTOR: Uses table locks to avoid concurrency issues,
-- 		  so it WILL slow down heavily loaded tables.
-- 		  This effecivly puts the table into
-- 		  TRANSACTION ISOLATION LEVEL SERIALIZABLE mode.
--
-- NOTE: You don't want to use key columns that are NULLable.

CREATE OR REPLACE FUNCTION add_merge_on_insert2(
  tablename TEXT,  -- table name
  keycols TEXT[],  -- key columns
  updatecols TEXT[] -- columns to update is key columns match
) RETURNS TEXT
AS $BODY$

DECLARE
  trig  TEXT;
  upclause  TEXT := '';
  keyclause TEXT := '';
  out_msg TEXT;
BEGIN
  -- setup the where clause with all key columns (probably primary keys)
  FOR a IN 1 .. array_upper(keycols,1) LOOP
    keyclause := keyclause || quote_ident(keycols[a]) || ' = NEW.'
      || quote_ident(keycols[a]) || ' AND ';
  END LOOP;
  --trim the last AND
  keyclause := trim(trailing ' AND ' FROM keyclause);

  -- setup the columns to UPDATE
  FOR i IN 1 .. array_upper(updatecols,1) LOOP
    upclause := upclause || quote_ident(updatecols[i])
      || ' = COALESCE(NEW.' || quote_ident(updatecols[i])
      || ', orig.' || quote_ident(updatecols[i]) || '), ';
  END LOOP;
  --trim the last comma and space
  upclause := trim(trailing ', ' FROM upclause);

  ----- put together the function now
EXECUTE 'CREATE FUNCTION "' || tablename || '_merge_on_insert_f" ()
RETURNS TRIGGER AS $$
  DECLARE
    orig ' || quote_ident(tablename) || '%ROWTYPE;
  BEGIN
    -- NOTE: This function was dynamically built by add_merge_on_insert2
    LOCK TABLE ' || quote_ident(tablename) || ' IN ROW EXCLUSIVE MODE;

    SELECT INTO orig * FROM  ' || quote_ident(tablename) || ' WHERE '
|| keyclause || ';

    IF NOT FOUND THEN
      RETURN NEW;
    END IF;

    UPDATE ' || quote_ident(tablename) || ' SET ' || upclause || '
WHERE ' || keyclause || ';

    RETURN NULL;
  END;
  $$ LANGUAGE plpgsql
 '; -- end of execute

EXECUTE 'CREATE TRIGGER "' || tablename || '_merge_on_insert_t" BEFORE INSERT
    ON ' || quote_ident(tablename) || ' FOR EACH ROW
    EXECUTE PROCEDURE "' || tablename || '_merge_on_insert_f" ();
  '; -- end of execute

  out_msg := 'FUNCTION ' || tablename || '_merge_on_insert_f ();
TRIGGER ' || tablename || '_merge_on_insert_t';
    RETURN out_msg;
END;
$BODY$ LANGUAGE 'plpgsql';

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match