Re: delete statement returning too many results

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 29 Nov 2022 12:35:48 -0500

Harmen <harmen@xxxxxxxxx> writes:
> On Mon, Nov 28, 2022 at 12:11:53PM -0500, Tom Lane wrote:
>> So basically it's unsafe to run the sub-select more than once,
>> but the query as written leaves it up to the planner whether
>> to do that.  I'd suggest rephrasing as [...]

> I'm not the original poster, but I do use similar constructions for simple
> postgres queues. I've been trying for a while, but I don't understand where the
> extra rows come from, or what's "silent" about SKIP LOCKED.

Sorry, I should not have blamed SKIP LOCKED in particular; this
construction will misbehave with or without that.  The issue is with
using SELECT FOR UPDATE inside a DELETE or UPDATE that then modifies
the row that the subquery returned.  The next execution of the subquery
will, or should, return a different row: either some not-deleted row,
or the modified row.  So in this context, the result of the subquery
is volatile.  The point of putting it in a MATERIALIZED CTE is to
lock the result down regardless of that.

> Because we get different results depending on the plan postgres picks, I can
> see two options: either the query is broken, or postgres is broken.

You can argue that the planner should treat volatile subqueries
differently than it does today.  But the only reasonable way of
tightening the semantics would be to force re-execution of such a
subquery every time, even when it's not visibly dependent on the
outer query.  That would be pretty bad for performance, and I doubt
it would make the OP happy in this example, because what it would
mean is that his query "fails" every time not just sometimes.
(Because of that, I don't have too much trouble concluding that
the query is broken, whether or not you feel that postgres is
also broken.)

The bigger picture here is that we long ago decided that the planner
should not inquire too closely into the volatility of subqueries,
primarily because there are use-cases where people intentionally rely
on them not to be re-executed.  As an example, these queries give
different results:

regression=# select random() from generate_series(1,3);
       random        
---------------------
  0.7637195395988317
 0.09569374432524946
   0.490132093120365
(3 rows)

regression=# select (select random()) from generate_series(1,3);
       random       
--------------------
 0.9730230633436501
 0.9730230633436501
 0.9730230633436501
(3 rows)

In the second case, the sub-select is deemed to be independent
of the outer query and executed only once.  You can argue that
if that's what you want you should be forced to put the sub-select
in a materialized CTE to make that plain.  But we felt that that
would make many more people unhappy than happy, so we haven't
done it.  Maybe the question could be revisited once all PG
versions lacking the MATERIALIZED syntax are long dead. 

			regards, tom lane