On Tue, Apr 11, 2023, at 4:22 PM, Tom Lane wrote:
Federico <cfederico87@xxxxxxxxx> writes:> Searching the archive seems that a using the INSERT SELECT ORDER BY> form should be a better solution,> so the above insert should be rewritten as> INSERT INTO t(data)> SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,> num) ORDER BY num> RETURNING id> to ensure that the id are created in the order specified by num. The> returned id can again be in> arbitrary order, but sorting them should enable correctly matching the> orm object so that they can> be properly updated.> Is this correct?No. Sadly, adding that ORDER BY is just voodoo programming, becauseit applies to the result of the SELECT while promising nothing aboutthe order in which INSERT/RETURNING will act on those rows.Re-reading that 2012 thread, the main new observation I'd make todayis that parallel operation is a thing now, and it's not hard to foreseethat sometime soon we'll want to parallelize INSERTs. Which'd make it*really* hard to promise anything about the order of RETURNING output.
if I can state this without having RETURNING getting in the way, because we know RETURNING is not ordered.
Start with this table:
CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
a INT,
b INT
)
Then insert two rows, where id SERIAL fires implicitly, assume the next value the sequence will give us is N1, and then the value after that is N2. It doesn't matter what N1 and N2 are (don't need to be consecutive) but we want N2 > N1, that is, increasing.
INSERT INTO mytable (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num
Then SELECT with ORDER BY:
SELECT id, a, b FROM mytable ORDER BY id
We want the results to be:
(N1, 10, 11)
(N2, 12, 13)
and we dont want them to *ever* be:
(N1, 12, 13)
(N2, 10, 11)
that is, we want the SERIAL column (or an IDENTITY also) to be lined up with the VALUES.
From what you wrote in https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us , that seems to be exactly what you've stated, where this statement:
INSERT INTO table (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num
is organized by the query planner to essentially be equivalent to this, where the nextval() is part of the SELECTed data:
INSERT INTO mytable (id, a, b) SELECT nextval('mytable_id_seq'), p1, p2 FROM
(SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num) as _x
in practice, we add "RETURNING id" and expect those "id" cols to be in increasing order, so we sort to match it up with the input rows.
I think if you want to use RETURNING with multi-row inserts, thething to do is more likeINSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, idand then explicitly match up the returned "data" values rather thanpresuming they appear in the same order you wrote them in in VALUES.
we're going to do that also when the table has something like a uuid for a primary key or otherwise.
Admittedly this might be problematic if some of the VALUES rowsare identical, but how much should you care?
we only do any of this if the rows have something unique in them we can hook onto.
regards, tom lane
thanks so much for replying!