performance of analytical query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

we have found that (probably after VACUUM ANALYZE) one analytical query starts to be slow on our production DB. Moreover, more or less the same plan is used on our testing data (how to restore our testing data is described at the end of this email), or better to say the same problem exists in both (production vs testing data) scenarios: nested loop scanning CTE several thousand times is used due to the bad estimates: https://explain.dalibo.com/plan/sER#plan/node/87 (query is included on dalibo).

We improved the query guided by some intuitive thoughts about how it works and get a much faster (120x) plan https://explain.dalibo.com/plan/M21#plan/node/68. We continued with further improvement/simplification of the query but we get again a similar plan https://explain.dalibo.com/plan/nLb#plan/node/72 with nested loop and with original inferior performance. I realized that the success of the intermediate plan (M21) is somewhat random as is based on bad estimates as well.

Further, I tried version forcing to not materialize CTE https://explain.dalibo.com/plan/0Tp#plan and version using PG default CTE materialization policy https://explain.dalibo.com/plan/g7M#plan/node/68. Both with no success.

Do you have any idea how to get HASH JOINS in the CTE w_1p_data instead of NESTED LOOPs?
* Add some statistics to not get bad estimates on "lower-level" CTEs?
* Some PG configuration (I am currently only disabling JIT [1])?
* Rewrite that query into several smaller pieces and use PL/pgSQL to put it together?
* In a slightly more complicated function I used temporary tables to be able to narrow statistics [2] but I am afraid of system table bloating because of the huge amount of usage of this function on the production (hundred thousand of calls by day when data are to be analyzed).

-------------------------------------------------------------------
how to restore data
===============
ERD of the schema is also available [3].

testing data as a part of an extension
---------------------------------------
It is possible to install [4] the extension https://gitlab.com/nfiesta/nfiesta_pg and run regression tests [5] (make installcheck-all). This will create database contrib_regression_fst_1p (besides other DBs) and populate this DB with the testing data. The regression test fst_1p_data is in fact testing functionality/code, which I am experimenting with.

using DB dump (without extension)
----------------------------------------------
It is also possible to create mentioned testing DB by simply downloading DB dumps from the link
https://drive.google.com/drive/folders/1OVJEISpfuvbxPQG1ArDmSQxZByNZN0xG?usp=sharing
followed by creating DB with postgis extension and restoring dumps:
* perf_test.sql (format plain) to be used with psql \i  
* perf_test.dump to be used with pg_restore...

Thank you for possible suggestions, Jiří.

[1] https://gitlab.com/nfiesta/nfiesta_pg/-/blob/master/.gitlab-ci.yml#L10
[2] https://gitlab.com/nfiesta/nfiesta_pg/-/blob/master/functions/extschema/fn_2p_data.sql#L79
[3] https://gitlab.com/nfiesta/nfiesta_pg/-/wikis/Data-Storage#v25x.
[4] https://gitlab.com/nfiesta/nfiesta_pg/-/wikis/Installation
[5] https://gitlab.com/nfiesta/nfiesta_pg/-/jobs/1762550188

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux