On 3/29/21 3:00 PM, Don Seiler wrote:
Good evening,
Please see my gist at
https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056
<https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056> for
a complete test case.
I tested this on PG 12.6 and 13.2 and observed the same on both.
We were expecting the queries that use dts_temp to only return 3 rows.
However the subquery starting at line 36 returns ALL 250,000 rows from
dts_orders. Note that the "order_id" field doesn't exist in the dts_temp
table, so I'm assuming PG is using the "order_id" field from the
dts_orders table. If I use explicit table references like in the query
at line 48, then I get the error I would expect that the "order_id"
column doesn't exist in dts_temp.
When I use the actual column name "a" for dts_temp, then I get the 3
rows back as expected.
I'm wondering if this is expected behavior that PG uses the
dts_orders.order_id value in the subquery "select order_id from
dts_temp" when dts_temp doesn't have its own order_id column. I would
have expected an error that the column doesn't exist. Seems very
counter-intuitive to think PG would use a column from a different table.
See:
https://www.postgresql.org/message-id/Pine.LNX.4.56.0308011345320.881@xxxxxxxxxxxxxxxxxx
This issue was discovered today when this logic was used in an UPDATE
and ended up locking all rows in a 5M row table and brought many apps to
a grinding halt. Thankfully it was caught and killed before it actually
updated anything.
Thanks,
Don.
--
Don Seiler
www.seiler.us <http://www.seiler.us>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx