Re: Performance of Query 4 on TPC-DS Benchmark

Andrei Lepikhov <lepihov@xxxxxxxxx> · Tue, 12 Nov 2024 17:40:59 +0700

On 11/11/24 17:49, Ba Jinsheng wrote:

 >It is all the time a challenge for PostgreSQL to estimate such a filter
 >because of absent information on joint column distribution.
 >Can you research this way by building extended statistics on these
 >clauses? It could move the plan to the more optimal direction.

Thanks a lot for your effort to analyze this issue, and we really 
appreciate your suggestions!  Currently, we focus on exposing these 
issues that affect performance. In the future, we may consider to look 
into such a direction as you suggested.
It would be better to participate in further analysis - at least, to 
find out general solution and classify your findings.
For example in your query, if you replace CTE with a table (see in the 
attachment) and execute vacuum analyze on this table you will have 
well-estimated query which executes fast.

 > Have you tried any tools to improve the cardinality yet, like aqo [0]?
Yes, but it takes nearly 1 hour to run this query at a time, so I only 
run "EXPLAIN ANALYZE" once, and the performance seems slightly improved.
Hmm, AQO iteratively approaches optimal decision. Sometimes it is a 
local optimum, but we still don't have a method practical enough to kick 
its out of the trap of local optimum.

--
regards, Andrei Lepikhov
Attachment:
tpcds-1.sql

Description: application/sql
Attachment:
year_total.sql

Description: application/sql