Slow query performance inside a transaction on a clean database

<val.janeiko@xxxxxxxxx> · Fri, 5 Mar 2021 17:55:37 -0000

Hi everyone,

I have a SELECT query that uses a long chain of CTEs (6) and is executed
repeatedly as part of the transaction (with different parameters). It is
executed quickly most of the time, but sometimes becomes very slow. I
managed to consistently reproduce the issue by executing a transaction
containing this query on an empty database. The query is fast for the first
150-170 inserted resources, but ~50% of the executions afterwards take 5.6s
instead of 1.4ms. Additionally it only becomes slow if resources are
inserted in a random order, if I insert resources sorted by
`start_date_time` column the query is always fast. 

The slow query is part of the transaction (with Repeatable Read isolation
level) that executes *create if not exists* type flow for 211 resources.
Each resource insertion, inserts multiple rows into each table. Each
resource has a unique `resource_surrogate_id`, which is part of every row
inserted for that resource. Inside a transaction search is performed using
values for a resource, before that resource is inserted, hence it always
returns 0 rows (only date values are changing and never overlap). Resources
are inserted in the order of increasing `resource_surrogate_id` (only
`resource_type_id == 52` rows are part of the transaction).

Some more info about the CTEs:
- cte0: always matches N rows (from 5*N rows inserted for each resource)
- cte1: always matches N rows (from 5*N rows inserted for each resource)
- cte2: always matches N rows (from 17*N rows inserted for each resource)
- cte3 when combined with cte4: always matches 0 rows (from 2*N rows
inserted for each resource)
- (if insertions are ordered by start_date_time, cte3 matches 0 rows, and
query is fast - execution plan not included)

Here are the results of EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT
JSON):
- Slow: https://explain.depesz.com/s/e4Fo
- Fast: https://explain.depesz.com/s/7HFJ

I have also created a gist:
https://gist.github.com/anyname2/e908d13d515e8970e599eb650cab15fe
- init.sql - is a script to create tables and indexes
- parametrized-query.sql - is the query being executed
- pgdump.sql - database dump

PostgreSQL Version: PostgreSQL 13.2 on x86_64-pc-linux-musl, compiled by gcc
(Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
Setup: PostgreSQL is running inside a docker container (with default
parameters), issue is reproducible both in the Kubernetes cluster and
locally.

Can anyone help diagnose this?

There are a few question I have:
- Repeatable Read transaction is running on an empty database, hence it
should not match anything. Why are resources inserted in the current
transaction considered in the query? (if I understood the execution plan
correctly)
- What causes the slow case? How can I rewrite the query to avoid the slow
case?
- Is it my query not optimised? Or should the execution planner handle it
better somehow?

Thank you,
Valentinas