Re: insert on conflict

armand pirvu <armand.pirvu@xxxxxxxxx> · Tue, 27 Jun 2017 15:43:35 -0500

On Jun 27, 2017, at 3:30 PM, Peter Geoghegan <pg@xxxxxxx> wrote:

On Tue, Jun 27, 2017 at 12:40 PM, armand pirvu <armand.pirvu@xxxxxxxxx> wrote:
so how is it working in fact ? Isn't it working like looping in the
IVEE.dim_company  and for each company_id if the record does have a
correspondent in csischema.dim_company then update csischema.dim_company set
company_name = EXCLUDED.company_name  where company_id=...  ? If so isn't it
supposed to use the PK for each company_id ? Or is it more like building a
whole list from IVEE.dim_company and treat like a join ? Just trying to
understand

The processing here has to happen a tuple at a time. That's because
the index structure itself is the only authoritative source of truth
about whether or not there is a possible conflict. An MVCC snapshot
isn't good enough, because it sees a consistent view of things, not
the true physical reality of what exists or does not exist in the
index.

What you end up with here is a weird nested loop join, you might say.
The implementation couldn't do it any other way (this could never
behave more like a merge join), because we must eagerly check for
conflicts right as we insert (our insert would be registered by
*other* inserters/upserters as a conflict). If the implementation did
ever do it that way, it would break the important UPSERT guarantees
around concurrency.

MERGE does this in other systems, which is okay for those other
systems because MERGE makes no special promises about concurrency
(e.g., you can get a unique violation in the joined-on column with
MERGE). But, MERGE would be faster for bulk loading, which is what
MERGE is good for.

-- 
Peter Geoghegan

Hi Peter

So for example if IVEE.dim_company has 10k rows and csischema.dim_company has 40 rows, what will happen for each row in IVEE.dim_company we check csischema.dim_company and if the check tells row is in it switches to update and this would mean yes I scan IVEE.dim_company, however should an update be needed in csischema.dim_company it will use the csischema.dim_company PK since we pass one value gotten from IVEE.dim_company
The question I guess is what happens IF I IVEE.dim_company accounts for far more than 5% of csischema.dim_company ? Will that translate into a scan on csischema.dim_company ?

What I am looking at now looks like a potential racing contention which so I am wondering if there are better ways to do it

Thanks
Armand