Re: Setting Statistics on Functional Indexes

Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> · Wed, 24 Oct 2012 14:31:11 -0500

On 10/24/2012 02:11 PM, Tom Lane wrote:

It's not particularly (not that you've even defined what you think
"optimistic" is, much less mentioned what baseline you're comparing
to).

The main flaw with my example is that it's random. But I swear I'm not 
making it up! :)

There seems to be a particularly nasty edge case we're triggering, then. 
Like I said, it's worse when col1+col2 don't match anything. In that 
case, it's using the trunc index on the date column, which has 
demonstrably worse performance. Here are the two analyzes I got 
before/after front-loading statistics.

Before stats increase:

 Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=78.282..78.282 
rows=0 loops=1)
   Sort Key: id, action_date
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using idx_date_test_action_date_trunc on date_test 
(cost=0.00..9.37 rows=1 width=23) (actual time=78.274..78.274 rows=0 
loops=1)
         Index Cond: (date_trunc('day'::text, action_date) >= 
'2012-10-24 00:00:00'::timestamp without time zone)
         Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
 Total runtime: 78.317 ms

And then after. I used your unofficial trick to set it to 1000:

alter index idx_date_test_action_date_trunc
      alter column date_trunc set statistics 1000;
analyze date_test;

 Sort  (cost=9.83..9.83 rows=1 width=23) (actual time=0.038..0.038 
rows=0 loops=1)
   Sort Key: id, action_date
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using idx_date_test_col1_col2 on date_test 
(cost=0.00..9.82 rows=1 width=23) (actual time=0.030..0.030 rows=0 loops=1)
         Index Cond: (((col1)::text = 'S:96'::text) AND (col2 = 
657::numeric))
         Filter: (date_trunc('day'::text, action_date) >= '2012-10-24 
00:00:00'::timestamp without time zone)
 Total runtime: 0.066 ms

This is on a bone-stock PG 9.1.6 from Ubuntu 12.04 LTS, with 
default_statistics increased to 500. The only thing I bumped up was the 
functional index between those two query plans.

But then I noticed something else. I reverted back to the old 500 
default for everything, and added an index:

create index idx_date_test_action_date_trunc_col1
    on date_test (date_trunc('day', action_date), col1);

I think we can agree that this index would be more selective than the 
one on date_trunc by itself. Yet:

 Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=77.055..77.055 
rows=0 loops=1)
   Sort Key: id, action_date
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using idx_date_test_action_date_trunc on date_test 
(cost=0.00..9.37 rows=1 width=23) (actual time=77.046..77.046 rows=0 
loops=1)
         Index Cond: (date_trunc('day'::text, action_date) >= 
'2012-10-24 00:00:00'::timestamp without time zone)
         Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
 Total runtime: 77.091 ms

All I have to say about that is: wat.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxxxxxxxxx

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance