Hi everyone,
I have a SELECT query that uses a chain of CTEs (4) that is slow to run on a large
database. But if I change a where clause in one of the small CTEs from an
equality to an equivalent nested IN query, then the query becomes fast. Looking
at the query plan I can see that after the change Postgres avoids a large and
slow index scan by using a different index and aggregation. I am reluctant to
accept the accidental "fix" because it seems odd and counter intuitive. Can
anyone shed some light on what's going on? Is my fix the intended solution or
is there a better way to write this query?
We have a system which stores resource blobs and extracts search parameters
into a number of tables. The query in question tries to find all resources with
a specific tag (cte0) that are related to resource X (cte2) and are dated
before some (recent) date Y (cte1) and sort them by date (cte3 & cte4). The
query was working okay on a small database, but over time as the database grew
the query started to timeout. Which is why I am looking at it now.
Here are the results of EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON):
- Slow: https://explain.depesz.com/s/joHK
- Fast: https://explain.depesz.com/s/tgd4
Some more info about the CTEs:
- cte0: select resources with a specific tag
(most common resource types with the most common tag)
- cte1: filter resource by date no later than Y
(matches ~50% of the table, and most of resource from cte0)
- cte2: select resources that are related to a specific resource X
(matches 1-5 resources)
- cte3: adds the date as a sort value
- cte4: sorts the result
I have also created a gist:
https://gist.github.com/valeneiko/89f8cbe26db7ca2651b47524462b5d18
- Schema.sql: the SQL script to create tables and indexes
- Query.sql: the query I am trying to run
- Postgres Settings, Table sizes and Statistics are also included in the gist
PostgreSQL Version:
PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
Setup: PostgreSQL is running inside a docker container on a dedicate node in a
Kubernetes cluster (using Zalando Spilo image: https://github.com/zalando/spilo)
Thank you,
Valentinas
I have a SELECT query that uses a chain of CTEs (4) that is slow to run on a large
database. But if I change a where clause in one of the small CTEs from an
equality to an equivalent nested IN query, then the query becomes fast. Looking
at the query plan I can see that after the change Postgres avoids a large and
slow index scan by using a different index and aggregation. I am reluctant to
accept the accidental "fix" because it seems odd and counter intuitive. Can
anyone shed some light on what's going on? Is my fix the intended solution or
is there a better way to write this query?
We have a system which stores resource blobs and extracts search parameters
into a number of tables. The query in question tries to find all resources with
a specific tag (cte0) that are related to resource X (cte2) and are dated
before some (recent) date Y (cte1) and sort them by date (cte3 & cte4). The
query was working okay on a small database, but over time as the database grew
the query started to timeout. Which is why I am looking at it now.
I have accidentally fixed the performance by replacing `system_id = 20` with
`system_id IN (SELECT system_id FROM fhir.system WHERE value = 'REDACTED')`.
The nested query here returns a single row with a value `20`.
`system_id IN (SELECT system_id FROM fhir.system WHERE value = 'REDACTED')`.
The nested query here returns a single row with a value `20`.
- Slow: https://explain.depesz.com/s/joHK
- Fast: https://explain.depesz.com/s/tgd4
Some more info about the CTEs:
- cte0: select resources with a specific tag
(most common resource types with the most common tag)
- cte1: filter resource by date no later than Y
(matches ~50% of the table, and most of resource from cte0)
- cte2: select resources that are related to a specific resource X
(matches 1-5 resources)
- cte3: adds the date as a sort value
- cte4: sorts the result
I have also created a gist:
https://gist.github.com/valeneiko/89f8cbe26db7ca2651b47524462b5d18
- Schema.sql: the SQL script to create tables and indexes
- Query.sql: the query I am trying to run
- Postgres Settings, Table sizes and Statistics are also included in the gist
PostgreSQL Version:
PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
Setup: PostgreSQL is running inside a docker container on a dedicate node in a
Kubernetes cluster (using Zalando Spilo image: https://github.com/zalando/spilo)
Thank you,
Valentinas