Re: [EXTERNAL]: Re: UPSERT in Postgres

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Sat, 8 Apr 2023 14:51:08 -0700

On 4/6/23 17:49, Louis Tian wrote:
Hi Peter,

Thanks for your reply. Appreciate the help and discussion.

  In general UPSERT (or any definition of it that I can think of) does
not imply idempotency.

"Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application." from Wikipedia.
the concept of Idempotence when applies to HTTP is consistent with the above. https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you going by a different defintion that I am not aware of?
If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
If a row already exists, the first statement will update the row so does any subsequent statements. executing the same update multiple time is the same as executing it only once.
If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the update has no real effect since it the value is exactly the same as the insert.
So by defintion, upsert is idempotent.

No it is not as Israel Brewster pointed out.

To his example I would add:

alter some_table add column ts_upsert_update timestamptz;

insert into some_table values('foo', 'bar') on conflict(tbl_id) do 
update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld, 
ts_upsert_update = now();

You are substituting whatever definition you have in your head for the 
definition as it actually exists.

It could just be a unique index or a unique constraint. So you can
upsert on any individual unique constraint/index, or the primary key.
Of course there might be several on a given table, but you can only
use one as the "conflict arbiter" per statement.

Understand that I can use any unique constraint with on conflict.
But semantically the only correct one is the primary key, since that's what identifies a row logically.
In that sense, any unique column(s) is a potential candidate for primary key.
It's more of a pedantic point rather than pragmatic one.
It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation level, since all index points to the tuple directly
Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

Again you are dealing with the imagined instead of the reality. Along 
that line you left out that a 'exclusion constraint violation error' can 
also trigger the ON CONFLICT.

Use some pesudo code might be helpful here to explain the difference.

How on conflict works at the moment.

try {
  insert row
} catch (duplicated key error) {
   update row
}

How I think it upsert should work

And therein lies your problem, you are imagining something that does not 
exist and more to the point will most likely not exist as it would break 
all code that depends on above behavior.

if (new.id exists) {
  update row
} else {
  insert row
}

I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results in a not null constraint error.

To be expected, as after all the command is:

INSERT INTO <some_table> ...

While I am expecting the insert is never executed in the first place when that row already exist (as identified by it primary key). So the update execute without error.
I hope the pesudo code above is enough to clarify the difference?

Cheers,
Louis Tian

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx