Hello great day, we have a strange case with slow query and would like some help.
I've already read the article https://wiki.postgresql.org/wiki/Slow_Query_Questions
Explain: https://paste.depesz.com/s/PLP
Explain2: https://explain-postgresql.com/archive/explain/8e4b573c5f7bcf3a0d30675a430051fd:0:2023-12-26
tunning: https://paste.depesz.com/s/dXa
We have citus cluster with the following configuration: 1 master + 3 data nodes, each machine have:
- 24 cores (Intel Xeon E5 2620)
- 192 GB RAM
- 1TB SSD
each node has configured postgres settings using tuning.sql
The main Table DDL is in (ddl.sql)
also distributed are as follow:
SELECT create_distributed_table('salert_post', 'id',shard_count := 72);
SELECT create_distributed_table('salert_q56', 'post',
colocate_with => 'salert_post');
when run the query (query.sql) as you can see in explain (plan4_v3.txt) citus take about 18s to run all fragments
but each fragment take at most 2s, so my questions are- why citus take this time in run all fragments?
- if I tuned each postgres node efficiently why take much time to make sort and aggregate with citus results?
good night, I hope you can help me with some ideas
also we remove partitions, and test only with citus, but query took more than a minute.
as a note, we not have 72 shards on the same node we have 72 in total, 24 shards each node.
I think the problem was in Sort and in GroupAggregate I no have idea how speed up this in master node, because the Custom Scan (Citus Adaptive) is not too slow, the most time is consumed in master on Sort and group
I hope you can help me.