Re: Guidance on INSERT RETURNING order

Federico <cfederico87@xxxxxxxxx> · Tue, 11 Apr 2023 23:38:20 +0200

On Tue, 11 Apr 2023 at 23:31, Rob Sargent <robjsargent@xxxxxxxxx> wrote:
>
> On 4/11/23 14:37, Federico wrote:
> >
> > The problem here is not having the auto increment id in a particular
> > order, is that there
> > is apparently no correlation with the position of an element in the
> > values clause with the
> > id generated. That's the reason for using the sentinel column in the
> > general solution in the previous message.
> >
> > The extend on the use case, SQLAlchemy has 3 objects T that have
> > T(data='a'), T(data='b'), T(data='c') but no
> > value for the id column. The objective is to insert the 3 data values,
> > get back the ids and correctly match them with
> > the correct 3 objects.
> >
> >> No.  Sadly, adding that ORDER BY is just voodoo programming, because
> >> it applies to the result of the SELECT while promising nothing about
> >> the order in which INSERT/RETURNING will act on those rows.
> > I wasn't probably clear, it's fine if INSERT/RETURNING order is
> > arbitrary, what matters is that the
> > autoincementing values is executed in the same order as select, like
> > mentioned in this
> > previous message
> > https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us
> >
> > Is that not the case?
> >
> >> Re-reading that 2012 thread, the main new observation I'd make today
> >> is that parallel operation is a thing now, and it's not hard to foresee
> >> that sometime soon we'll want to parallelize INSERTs.  Which'd make it
> >> *really* hard to promise anything about the order of RETURNING output.
> > I think it's fine not promising anything about the order of RETURNING, but
> > it would be very helpful having a way of tracking what input row
> > generated a particular
> > output row. Basically the sentinel case in the original post,
> > without actually having to insert the sentinel into the table.
> >
> >> I think if you want to use RETURNING with multi-row inserts, the
> >> thing to do is more like
> >>
> >>    INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
> >>
> >> and then explicitly match up the returned "data" values rather than
> >> presuming they appear in the same order you wrote them in in VALUES.
> >> Admittedly this might be problematic if some of the VALUES rows
> >> are identical, but how much should you care?
> > Well, the example is very easy, but it's hard to generalize when
> > inserting multiple columns
> > with possible complex values in them, since it would mean matching on
> > possibly large json values,
> > arrays, etc. So definitely not ideal
> >
> > Thanks,
> > Federico
> >
> Can your client retain a hashmap of md5,data pairings, allowing the
> lookup on the way back using the returned data and supplied id?
>

When using unique columns or similar, that's something that is done,
but if there are no unique columns in the value no match can be done
reliably with the source data, since sqlalchemy is a library that
allows arbitrary schemas to be generated.

Thanks for the reply,
  Federico