Search Postgresql Archives

Re: Strange behavior of function date_trunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On 05.05.2021 16:55, Tomas Vondra wrote:
Well, it'd not like date_trunc is executed for each row while now() is executed only once. The functions are executed for each row in both cases...
A couple more experiments.
Since I can't to track usage of system functions, I decided to play with user defined stable function.

In this case it is a wrapper for date_trunc:
CREATE OR REPLACE FUNCTION user_date_trunc(field text, source timestamptz) RETURNS timestamptz
AS $$
BEGIN
    RETURN date_trunc(field, source);
END;
$$ LANGUAGE plpgsql STABLE;

SET track_functions = 'all';
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);


Let's create a test table:
CREATE TABLE t AS
    SELECT g.x, g.x::text AS y
    FROM generate_series('2021-01-01', '2021-06-01', '1 s'::interval) AS g(x) ORDER BY random();
VACUUM ANALYZE t;
SET random_page_cost = 1.1; -- for ssd disks


Previous query with user defined function:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..3520631.00 rows=13045175 width=31) (actual time=63.977..7157.852 rows=13046401 loops=1)    Filter: (x >= user_date_trunc('year'::text, '2021-05-06 00:00:00+03'::timestamp with time zone))
 Settings: random_page_cost = '1.1'
 Planning Time: 0.373 ms
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true, Deforming true    Timing: Generation 0.404 ms, Inlining 53.122 ms, Optimization 7.100 ms, Emission 3.600 ms, Total 64.227 ms
 Execution Time: 7502.564 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc;
    funcname     |  calls
-----------------+----------
 user_date_trunc | 13046402

The function is executed once for each row in the table +1. As you explained earlier.
Ok.

But when there is an index on the column, the situation changes:
CREATE INDEX ON t(x);
ANALYZE t;
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);

EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_x_idx on t  (cost=0.69..373188.48 rows=13046480 width=31) (actual time=3.163..26476.012 rows=13046401 loops=1)    Index Cond: (x >= user_date_trunc('year'::text, '2021-05-06 00:00:00+03'::timestamp with time zone))
 Settings: random_page_cost = '1.1'
 Planning Time: 0.294 ms
 JIT:
   Functions: 3
   Options: Inlining false, Optimization false, Expressions true, Deforming true    Timing: Generation 0.551 ms, Inlining 0.000 ms, Optimization 0.380 ms, Emission 2.587 ms, Total 3.518 ms
 Execution Time: 26787.377 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc;
    funcname     | calls
-----------------+-------
 user_date_trunc |     2

Only two times!
Does having an index allow the function value to be cached?

By the way, an index is used to access the table, although seq scan are much better.

Now with scalar subquery:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= (SELECT user_date_trunc('year', '2021-05-06'::timestamptz));
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_x_idx on t  (cost=0.70..194745.67 rows=4348827 width=31) (actual time=3.000..26543.395 rows=13046401 loops=1)
   Index Cond: (x >= $0)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
 Settings: random_page_cost = '1.1'
 Planning Time: 0.071 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true    Timing: Generation 0.671 ms, Inlining 0.000 ms, Optimization 0.188 ms, Emission 2.583 ms, Total 3.441 ms
 Execution Time: 26875.105 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc;
    funcname     | calls
-----------------+-------
 user_date_trunc |     3

The function is executed once, but the index is still in use.

The second question. What is the reason for choosing an index scan?

Seq scan with constant value only:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= '2021-01-01'::timestamptz;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..259011.00 rows=13046480 width=31) (actual time=2.135..913.628 rows=13046401 loops=1)
   Filter: (x >= '2021-01-01 00:00:00+03'::timestamp with time zone)
 Settings: random_page_cost = '1.1'
 Planning Time: 0.140 ms
 JIT:
   Functions: 2
   Options: Inlining false, Optimization false, Expressions true, Deforming true    Timing: Generation 0.392 ms, Inlining 0.000 ms, Optimization 0.163 ms, Emission 1.837 ms, Total 2.391 ms
 Execution Time: 1195.985 ms


The statistics on t.x shows that the condition in the query is not selective and seq scan are preferred over index scan.
SELECT n_distinct, (histogram_bounds::text::text[])[1]
FROM pg_stats WHERE tablename = 't' AND attname = 'x';
 n_distinct |    histogram_bounds
------------+------------------------
         -1 | 2021-01-01 00:01:10+03

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux