Re: INSERT ... ON CONFLICT DO UPDATE

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Tue, 21 Jul 2015 09:21:21 +0200

Hi Rafal:

On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak <rafal@xxxxxxxxx> wrote:
> 3. there are methods (like cryptographic "random" sequence), which
> guarantee no conflicts. So one should resort to that.
> Regarding the last point. Usually, I implement one-time used vouchers as
> rows in table like:
> CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
> null default timestamp_pl_interval(now()::timestamp, '2
> min'::interval),..., unique (voucher,consumed) );
> with CONSUMED column NULLyfied when voucher is used. The entire row of
> consumed voucher is purged after clearence and verification, which
> happen significantly later.

Random as a primary key is considered a bad practice by many people
with much experience, nullyfing it too. Many people even frown on just
changing the primary key ( and one of the reasons for using serial as
keys in many situations is to have a guaranteed not null unchanging
value ).

> Such short lived (when active) voucher is usually just 6-digit long, to
> help people enter it.

Then, random and with a narrow value domain, they make, IMNSHO,  a
really bad  choice for primery keys.

> I don't know much about cryptography, but would a generic encryption
> function (like that indicated by Daniel) have the same "waking through
> the entire range-space" behavior as the original when that range-space
> is externally (by my application) truncated to those 6 digits? If not,
> would it be as efficient in conflict avoidance as used with original
> 32-bit range-space?

An encryption function never has collisions ( do not confuse with a
hash ). If it had you would be unable to decrypt it. The problem is
the value domain for you. i.e., for your example you could choose a
bit stream cipher applied to a 20 bit value. This is a moderately
complex prolem to find or build ( from the classic cryptographic
primitives nearly every language includes ). This will map every
different 20 bit input value to a different 20 bit output value, so
your value domain will be 20 bit numbers, your inputs will be the 10^6
6 digit numbers and the outputs will be 10^6 DIFFERENT 20bit numbers,
of wich you could expect about 4.8% of them ( 2^20-10^6)/10^6 to have
7 digits ( with a leading one in this case ). To solve that problem
you could use 19 digit input/output numbers or try to fin a decimal
cypher which uses exactly 10^6 input digits. If you use a 32 bit block
cypher it will not have collisions, but if you TRUNCATE the 32 bit ~
9.5 digits output to 6 digits, you are no longer encrypting. You may
call it hashing or whatever, but that is NOTt encryption, you would
have collisions.

> Then again. Is it really a "good practice" to rely on a programmer to
> peek "proper/correct encryption helper" instead of providing him/her
> with a database-integrated tool for a "well defined" and not so rare
> usage scenario as "random default" for UNIQUE/PK column?

Many of us are too old to get caught by this. This question is like
asking "Is it good practice to hit a person with a 10 pound hammer in
the head instead of giving a cookie?". There are other options.

IMO NOT modifying a very complex chunk of code ( the server code doing
the inserts and checking the collision cases and acting on them, plus
the parser for insert queries plus .... ) and risking all the bugs it
may introduce to help with inserting random pk is good practice. It
doesn't matter if the requesting programmer peeks a bad encryption
methods, keeps his old code for inserting random ids or introduces
bugs in his program, the potential harm to the rest of the users is
too great.

> So my conclusion from this thread is that as this usage scenario does
> not seem to be foreseen by current implementation of ON CONFLICT
> transaction, a workaround exists (like: cryptographic range-walker).
> Being it a workaround, I'd vote for some direct supported of that
> scenario in the future at database level.

Bear in mind your problem is totally ill defined. I mean, you want to
insert a random 6 digits ID, and want the database to keep trying
until it finds an unique one. What should it do if there already are
10^6 records in the db? Stall until a free one is found? abort? This
kind of uses is very narrow, and very difficult to get right , and
normally confined to the application domain. Even if you choose a
totally correct encryption function for collision avoidance, like
identity, you are going to have problems in your scheme.

You are not voting for anything, you need a feature proposal to vote
upon. So far the only one I could extract from this thread is
"something which magically solves the current Rafal problem". I would
vote against that.

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general