Hello,
On 05.05.2021 16:55, Tomas Vondra wrote:
Well, it'd not like date_trunc is executed for each row while now() is
executed only once. The functions are executed for each row in both
cases...
A couple more experiments.
Since I can't to track usage of system functions, I decided to play with
user defined stable function.
In this case it is a wrapper for date_trunc:
CREATE OR REPLACE FUNCTION user_date_trunc(field text, source
timestamptz) RETURNS timestamptz
AS $$
BEGIN
RETURN date_trunc(field, source);
END;
$$ LANGUAGE plpgsql STABLE;
SET track_functions = 'all';
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);
Let's create a test table:
CREATE TABLE t AS
SELECT g.x, g.x::text AS y
FROM generate_series('2021-01-01', '2021-06-01', '1 s'::interval)
AS g(x) ORDER BY random();
VACUUM ANALYZE t;
SET random_page_cost = 1.1; -- for ssd disks
Previous query with user defined function:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..3520631.00 rows=13045175 width=31) (actual
time=63.977..7157.852 rows=13046401 loops=1)
Filter: (x >= user_date_trunc('year'::text, '2021-05-06
00:00:00+03'::timestamp with time zone))
Settings: random_page_cost = '1.1'
Planning Time: 0.373 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 0.404 ms, Inlining 53.122 ms, Optimization 7.100
ms, Emission 3.600 ms, Total 64.227 ms
Execution Time: 7502.564 ms
SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
funcname | calls
-----------------+----------
user_date_trunc | 13046402
The function is executed once for each row in the table +1. As you
explained earlier.
Ok.
But when there is an index on the column, the situation changes:
CREATE INDEX ON t(x);
ANALYZE t;
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Index Scan using t_x_idx on t (cost=0.69..373188.48 rows=13046480
width=31) (actual time=3.163..26476.012 rows=13046401 loops=1)
Index Cond: (x >= user_date_trunc('year'::text, '2021-05-06
00:00:00+03'::timestamp with time zone))
Settings: random_page_cost = '1.1'
Planning Time: 0.294 ms
JIT:
Functions: 3
Options: Inlining false, Optimization false, Expressions true,
Deforming true
Timing: Generation 0.551 ms, Inlining 0.000 ms, Optimization 0.380
ms, Emission 2.587 ms, Total 3.518 ms
Execution Time: 26787.377 ms
SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
funcname | calls
-----------------+-------
user_date_trunc | 2
Only two times!
Does having an index allow the function value to be cached?
By the way, an index is used to access the table, although seq scan are
much better.
Now with scalar subquery:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= (SELECT user_date_trunc('year', '2021-05-06'::timestamptz));
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using t_x_idx on t (cost=0.70..194745.67 rows=4348827
width=31) (actual time=3.000..26543.395 rows=13046401 loops=1)
Index Cond: (x >= $0)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.26 rows=1 width=8) (actual
time=0.030..0.030 rows=1 loops=1)
Settings: random_page_cost = '1.1'
Planning Time: 0.071 ms
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true,
Deforming true
Timing: Generation 0.671 ms, Inlining 0.000 ms, Optimization 0.188
ms, Emission 2.583 ms, Total 3.441 ms
Execution Time: 26875.105 ms
SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
funcname | calls
-----------------+-------
user_date_trunc | 3
The function is executed once, but the index is still in use.
The second question. What is the reason for choosing an index scan?
Seq scan with constant value only:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= '2021-01-01'::timestamptz;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..259011.00 rows=13046480 width=31) (actual
time=2.135..913.628 rows=13046401 loops=1)
Filter: (x >= '2021-01-01 00:00:00+03'::timestamp with time zone)
Settings: random_page_cost = '1.1'
Planning Time: 0.140 ms
JIT:
Functions: 2
Options: Inlining false, Optimization false, Expressions true,
Deforming true
Timing: Generation 0.392 ms, Inlining 0.000 ms, Optimization 0.163
ms, Emission 1.837 ms, Total 2.391 ms
Execution Time: 1195.985 ms
The statistics on t.x shows that the condition in the query is not
selective and seq scan are preferred over index scan.
SELECT n_distinct, (histogram_bounds::text::text[])[1]
FROM pg_stats WHERE tablename = 't' AND attname = 'x';
n_distinct | histogram_bounds
------------+------------------------
-1 | 2021-01-01 00:01:10+03
--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company