Re: insert on conflict

Peter Geoghegan <pg@xxxxxxx> · Tue, 27 Jun 2017 13:30:16 -0700

On Tue, Jun 27, 2017 at 12:40 PM, armand pirvu <armand.pirvu@xxxxxxxxx> wrote:
> so how is it working in fact ? Isn't it working like looping in the
> IVEE.dim_company  and for each company_id if the record does have a
> correspondent in csischema.dim_company then update csischema.dim_company set
> company_name = EXCLUDED.company_name  where company_id=...  ? If so isn't it
> supposed to use the PK for each company_id ? Or is it more like building a
> whole list from IVEE.dim_company and treat like a join ? Just trying to
> understand

The processing here has to happen a tuple at a time. That's because
the index structure itself is the only authoritative source of truth
about whether or not there is a possible conflict. An MVCC snapshot
isn't good enough, because it sees a consistent view of things, not
the true physical reality of what exists or does not exist in the
index.

What you end up with here is a weird nested loop join, you might say.
The implementation couldn't do it any other way (this could never
behave more like a merge join), because we must eagerly check for
conflicts right as we insert (our insert would be registered by
*other* inserters/upserters as a conflict). If the implementation did
ever do it that way, it would break the important UPSERT guarantees
around concurrency.

MERGE does this in other systems, which is okay for those other
systems because MERGE makes no special promises about concurrency
(e.g., you can get a unique violation in the joined-on column with
MERGE). But, MERGE would be faster for bulk loading, which is what
MERGE is good for.

-- 
Peter Geoghegan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general