Hi, Postgresql,
I want to understand how the query optimizers affect the output of the window functions.
For example, set "cpu_tuple_cost = 50" in postgresql.conf and start the server, I apply the regress test (make installcheck). The test of window function fails.
Checking the diff and I found the output of the window functions are different. For example,
For the following query:
SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10;
The expected results are:
sum | unique1 | four
-----+---------+------
45 | 4 | 0
41 | 2 | 2
39 | 1 | 1
38 | 6 | 2
32 | 9 | 1
23 | 8 | 0
15 | 5 | 1
10 | 3 | 3
7 | 7 | 3
0 | 0 | 0
But the real results are:
sum | unique1 | four
-----+---------+------
45 | 0 | 0
45 | 1 | 1
44 | 2 | 2
42 | 3 | 3
39 | 4 | 0
35 | 5 | 1
30 | 6 | 2
24 | 7 | 3
17 | 8 | 0
9 | 9 | 1
There're altogether 6 queries in window test that outputs different query results.
I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose.
I suspected the previous queries had some side effect on the latter one (e.g., change the current row), so I removed all the previous queries before this query in window.sql. But the result did not change.
Could anyone explain this behavior? Or point out how to investigate?
Thanks a lot!
Tianyin
Hi.
In short: if no explicit ordering specivied for a query the resulting set can be in any order. It is up to query optimizer to chose in what order the resulting tuples will be.
The window function used in this test case rely on the order of the resulting set (it sums from current to the last) so it will generate different results for different query plans.
I think for this test cases (window functions) explicit ordering should be specified. In "normal" cases order dependent window functions are newer used without explicit ordering.