help understanding create statistic

Luca Ferrari <fluca1978@xxxxxxxxx> · Thu, 28 Jun 2018 11:38:51 +0200

Hi all,
in order to better understand this feature of 10, I've created a table
like this:

CREATE TABLE expenses(
   ...
   day date,
   year int,
   CHECK( year = EXTRACT( year FROM day ) )
);

so that I can ensure 'year' and 'day' are tied together:

SELECT
count(*) FILTER( WHERE year = 2016 ) AS by_year,
count(*) FILTER( WHERE EXTRACT( year FROM day ) = 2016 ) AS by_day
FROM expenses;
-[ RECORD 1 ]-
by_year | 8784
by_day  | 8784

Then I created a statistic:

CREATE STATISTICS stat_day_year ( dependencies )
ON day, year
FROM expenses;

select * from pg_statistic_ext ;
-[ RECORD 1 ]---+---------------------
stxrelid        | 42833
stxname         | stat_day_year
stxnamespace    | 2200
stxowner        | 16384
stxkeys         | 3 5
stxkind         | {f}
stxndistinct    |
stxdependencies | {"3 => 5": 1.000000}

Now, having an index on the extract year of day as follows:

CREATE INDEX idx_year
ON expenses ( EXTRACT( year FROM day ) );

why is the planner not choosing to use such index on a 'year' raw query?

EXPLAIN SELECT * FROM expenses
WHERE year = 2016;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..92782.34 rows=8465 width=32)
   Workers Planned: 2
   ->  Parallel Seq Scan on expenses  (cost=0.00..90935.84 rows=3527 width=32)
         Filter: (year = 2016)

The number of rows are correct, but I was expecting it to use the same
index as a query like "WHERE EXTRACT( year FROM day) = 2016" triggers.

Even altering the year column to not null does show any change, and
this is the plan obtained turning off seq_scan (to see the costs):

EXPLAIN ANALYZE SELECT * FROM expenses
WHERE year = 2016;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on expenses  (cost=10000000000.00..10000127402.44 rows=8451
width=32) (actual time=972.734..2189.300 rows=8784 loops=1)
   Filter: (year = 2016)
   Rows Removed by Filter: 4991216

Am I misunderstaing this functional dependency?

Thanks,
Luca