Search Postgresql Archives

Help needed to understand query planner regression with incremental sort

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quick description: After upgrading our Aurora PostgreSQL engine from v11 to v15, one of our often run SQL's started taking tens of seconds instead of running sub 100ms. Explain analyze showed that the query planner had switched to using incremental sort. Running ANALYZE on the table fixed the issue temporarily, but it came back after few days.

I'm looking to dig a bit deeper to be able to submit a bug report to the PostgreSQL team so maybe something can be done to fix this, as this definitely feels as a bug (the query running times are 100x in difference) but I need a bit of help as I'm new to this deep PostgreSQL analysis and I'm struggling finding resource how to proceed in such a case. And maybe it's an issue with my own setup or database configuration instead (ie. me just being stupid).

I know that I can solve the issue for my own environments with turning incremental sort off in the DB parameters, but I feel this might be worth putting a bit time into to be able to find the root cause and help me understand the planner a bit better.

More detailed description:
Note: I don't have reproduction steps at this time, this is one of the reasons I'm asking help to figure out what could I try to do to start trying reproducing this in local setup. Below info is just giving out as much info as I can to help understand the setup I have.

The query under question is rather simplistic as far as SQL itself goes at least (attached: original_query.sql):
select * from "products" inner join "shops" on "shops"."id" = "products"."shop_id" where "products"."status" = 'published' and products.desc_tsv @@ to_tsquery('simple', 'nike:*') order by "products"."pinned_at" ASC, "products"."listed_at" DESC limit 61;

With normal running conditions this query produces the following query plan: https://explain.dalibo.com/plan/af8ch7a59ch6459d (attached: good_planner_path.txt)

With incremental sorting path, it produces the following https://explain.dalibo.com/plan/428564152aa3ba37 (attached: incremental_sort_planner_path.txt)

I've tried to include as slim table / index setup as possible, but the tables are actually bigger and there are a lot more indexes present, I don't know how other indexes and / or columns might affect the planner path in this case.
https://dbdiagram.io/d/64771e48722eb7749422715e (attached: schema_setup.sql)

In production DB the size of products table is around 5 452 840 rows.
According to the information_schema the size of the products table is around 8.1GB.

If the incremental sort query plan starts happening, running ANALYZE products; fixes the issue and the planner switches back to the original plan.

What am I asking for?
Any ideas / help / queries that would get me closer to understanding why planner chooses so much slower query in this case or ideas how to start trying to reproduce this locally as it seems to happen intermittently in production environment, I'm unsure if I can replicate that type of load locally. Maybe someone spots very obvious mistakes / issues with the table / index / query setup right away that I've missed.


Any help appreciated,
Henrik.
QUERY PLAN
Limit  (cost=2285.01..5180.32 rows=61 width=2158) (actual time=5351.382..5351.399 rows=61 loops=1)
  ->  Incremental Sort  (cost=2285.01..1242947.06 rows=26139 width=2158) (actual time=5351.381..5351.393 rows=61 loops=1)
"        Sort Key: products.pinned_at, products.listed_at DESC"
        Presorted Key: products.pinned_at
        Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 266kB  Peak Memory: 266kB
        Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average Memory: 362kB  Peak Memory: 362kB
        ->  Nested Loop  (cost=0.86..1241366.91 rows=26139 width=2158) (actual time=5.292..5308.158 rows=18411 loops=1)
              ->  Index Scan Backward using products_pinned_at_ix on products  (cost=0.43..1172249.47 rows=26139 width=1460) (actual time=5.263..5203.180 rows=18411 loops=1)
                    Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND (status = 'published'::prod_status))
                    Rows Removed by Filter: 5415895
              ->  Memoize  (cost=0.43..3.64 rows=1 width=698) (actual time=0.004..0.004 rows=1 loops=18411)
                    Cache Key: products.shop_id
                    Cache Mode: logical
                    Hits: 10434  Misses: 7977  Evictions: 505  Overflows: 0  Memory Usage: 8194kB
                    ->  Index Scan using shops_pkey on shops  (cost=0.42..3.63 rows=1 width=698) (actual time=0.005..0.005 rows=1 loops=7977)
                          Index Cond: (id = products.shop_id)
Planning Time: 0.813 ms
Execution Time: 5353.360 ms

Attachment: original_query.sql
Description: Binary data

QUERY PLAN
Limit  (cost=0.85..9258.41 rows=61 width=2162) (actual time=2.352..33.389 rows=61 loops=1)
  ->  Nested Loop  (cost=0.85..4025217.24 rows=26523 width=2162) (actual time=2.351..33.376 rows=61 loops=1)
        ->  Index Scan using products_pinned_at_listed_at_ix on products  (cost=0.43..3929423.12 rows=26523 width=1463) (actual time=2.325..32.872 rows=61 loops=1)
              Filter: ((desc_tsv @@ '''nike'':*'::tsquery) AND (status = 'published'::prod_status))
              Rows Removed by Filter: 3376
        ->  Index Scan using shops_pkey on shops  (cost=0.42..3.61 rows=1 width=699) (actual time=0.006..0.006 rows=1 loops=61)
              Index Cond: (id = products.shop_id)
Planning Time: 5.115 ms
Execution Time: 33.509 ms

Attachment: schema_setup.sql
Description: Binary data


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux