Re: Joins of data-modifying CTE with the target table

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 19 Apr 2023 08:52:39 -0700

On 4/19/23 08:37, Alex Bolenok wrote:
Hi list,

This popped up yesterday during a discussion at the Boston PostgreSQL 
group meetup, and Jesper Pedersen had advised that I post it here.

Imagine this setup:

CREATE TABLE IF NOT EXISTS mytable (id BIGSERIAL PRIMARY KEY, value TEXT 
NOT NULL);

WITH    insert_cte AS
         (
         INSERT
         INTO    mytable (value)
         VALUES  ('test')
         RETURNING
                 *
         )
SELECT  mytable.*
FROM    insert_cte
JOIN    mytable
USING   (id);

This query will return nothing, even though people would expect it to 
return the newly inserted record.

This is just a minimally reproducible example, in which you can easily 
work around the problem just by getting rid of the join to mytable. But 
during my consulting career, I've seen people try putting together more 
complex queries using the same pattern, and this always comes as a surprise.

I get why it's not working (because the statement is not allowed to see 
the tuples with its own cmin), but I was wondering if it was worth it at 
least to spell it out explicitly in the documentation.

Right now the documentation says:

https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING <https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING>

    RETURNING data is the only way to communicate changes between
    different WITH sub-statements and the main query

which I don't think is covering the JOIN issue (after all, I am using 
the RETURNING clause to communicate with the main query).

Can we please add this example to the documentation? I can do the 
wording if that's something worth adding.

To add to Tom's post.

"
Data-modifying statements in WITH usually have RETURNING clauses (see 
Section 6.4), as shown in the example above. It is the output of the 
RETURNING clause, not the target table of the data-modifying statement, 
that forms the temporary table that can be referred to by the rest of 
the query. If a data-modifying statement in WITH lacks a RETURNING 
clause, then it forms no temporary table and cannot be referred to in 
the rest of the query. Such a statement will be executed nonetheless. A 
not-particularly-useful example is:

...

The sub-statements in WITH are executed concurrently with each other and 
with the main query. Therefore, when using data-modifying statements in 
WITH, the order in which the specified updates actually happen is 
unpredictable. All the statements are executed with the same snapshot 
(see Chapter 13), so they cannot “see” one another's effects on the 
target tables. This alleviates the effects of the unpredictability of 
the actual order of row updates, and means that RETURNING data is the 
only way to communicate changes between different WITH sub-statements 
and the main query. An example of this is that in

WITH t AS (
    UPDATE products SET price = price * 1.05
    RETURNING *
)
SELECT * FROM products;

the outer SELECT would return the original prices before the action of 
the UPDATE, while in

WITH t AS (
    UPDATE products SET price = price * 1.05
    RETURNING *
)
SELECT * FROM t;

the outer SELECT would return the updated data.
"

So the RETURNING temp table is the only thing you have to work on.

Thank you!

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx