Re: [EXTERNAL]: Re: UPSERT in Postgres

Israel Brewster <ijbrewster@xxxxxxxxxx> · Sat, 8 Apr 2023 09:09:39 -0800

> On Apr 6, 2023, at 4:49 PM, Louis Tian <louis.tian@xxxxxxxxxxxxxxxx> wrote:
> 
> Hi Peter,
> 
> Thanks for your reply. Appreciate the help and discussion.
> 
>>  In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
> 
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once. 

Not necessarily. Consider the following UPSERT statement:

INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 1) ON CONFLICT (customer_id) DO UPDATE SET order_count=customer_order_counts.order_count+1;

That is completely valid (I tested it), and actually makes sense as something you might want to do - keep track of how many orders a customer has placed, for example if you only keep the order records for 6 months, but still want to know the total number of orders the customer has placed. If it is a new customer, you insert a record for the customer with an order count of 1. Otherwise, you update the record to increment the order count. Clearly this is NOT an idempotent operation - every time you run it, it changes the order count, so the state of the database does NOT remain the same as if you only execute it once.

> If a row already exists, the first statement will update the row so does any subsequent statements. executing the same update multiple time is the same as executing it only once.
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the update has no real effect since it the value is exactly the same as the insert. 
> So by defintion, upsert is idempotent.
> 
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
> 
> Understand that I can use any unique constraint with on conflict. 
> But semantically the only correct one is the primary key, since that's what identifies a row logically.

Not really. Generally when I am doing an UPSERT, I am NOT using the primary key, but rather some other UNIQUE-ly indexed column(s). My primary key is typically an ID column that is defined as a serial, and automatically generated by the database. The unique column I use for the upset, however, is generally something that would identify the row to a human - such as Station ID and timestamp  columns in a database I have of seismic readings. Each reading gets a unique ID (the primary key) that identifies it to the database and is used in joins. However, occasionally a reading gets updated, so I update the row, based not on the primary key (which I don’t know for incoming data), but on the station and timestamp. UPSERT, but not based on the primary key.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation level, since all index points to the tuple directly
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.