Re: Question about (probably wrong) index scan cost for conditional indexes

Alex Lai <alai@xxxxxxxxxx> · Wed, 01 Feb 2012 14:21:45 -0500

Tom Lane wrote:
Maxim Boguk <maxim.boguk@xxxxxxxxx> writes:

I know there is issue with statistics over intarrays (it was there
very long time and sometime it's complicating things a lot).

However,  the 100x cost difference between:
SELECT * from test order by id limit 100;  (over "primary key (id)" btree index)
Limit  (cost=0.00..3.43 rows=100 width=37)
vs
SELECT * from test where sections && '{2}' order by value limit 100;
(over "test_value_in2section_key on test(value) where sections &&
'{2}'"   btree index)
Limit  (cost=0.00..539.29 rows=100 width=37)
seems wrong for me.

[ shrug... ]  It really is the fault of the bad rowcount estimate.
The cost estimates for the complete indexscans are reasonable.  However,
in one case you've got LIMIT thinking that it will fetch the first 100
out of 1000000 index entries, so it divides the indexscan cost estimate
by 10000, and gets something reasonably accurate.  In the other case,
LIMIT thinks it's going to fetch the first 100 out of 1000 index
entries, so it divides the indexscan cost estimate by 10, and comes out
with something not very accurate.  If the rowcount estimate for && had
been correct, those numbers would be much more alike.

Both queries performs the absolutely same task: fetch 100 entries from
the table based on the ideally suitable index (no post
processing/filtering were done at all... just return 100 sorted tuples
based on single index scan).

Well, you know that and I know that, but the exposed cost and rowcount
estimates for the IndexScan plan node imply something else entirely:
that the cost-per-tuple-fetched is a lot higher in the one case than
the other.  The LIMIT estimator has no reason, or method, to second
guess that.

And even if I drop the intarray index completely, than I still have a
wrong plan (bitmap scan + sort),  because planner cost for the index
scan over conditional index 100 more the it should be.
(e.g. there is still an issue even in absence of the intarray index).

Yeah, because it's not about the index, it's about the selectivity of
the && operator.  That estimate is wrong regardless of whether there
are any indexes involved.

Is absence of frequency statistics over intarrays somehow linked to
the wrong planner cost estimates for conditional index scan?

Well, we lack both the statistics and an operator selectivity function
that would know what to do with them.  Just a small matter of
programming ...

			regards, tom lane

Tom,

I had the same situation in one of my query.
Use the subquery can speed up almost by 100 times faster.

explain analyse
select FileId as "FileId", ESDT as "ESDT",1 as "Position" from 
V_FileMeta_L3
          where Archiveset = 61000 and ESDT= 'ESDT123'
                    and Source = 'SOURCE1234'
                    and (
                       (StartTime between '2012-01-28 
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
                                     and '2012-01-28 
07:41:27.000000Z'::timestamp + '100 days'::interval)
                          or
                        (EndTime between '2012-01-28 
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
                                    and '2012-01-28 
07:41:27.000000Z'::timestamp + '100 days'::interval)
                         )
                order by starttime
limit 1;

Limit  (cost=0.00..15.20 rows=1 width=22) (actual time=200.048..200.048 
rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..117596.32 rows=7736 width=22) (actual 
time=200.046..200.046 rows=1 loops=1)
        ->  Index Scan using ak_filemeta_l3_esdt_starttime_endtime on 
filemeta_l3 b  (cost=0.00..77200.55 rows=7736 width=22) (actual 
time=199.986..199.989
rows=2 loops=1)
              Index Cond: ((esdt)::text = 'ROLPT'::text)
              Filter: (((source)::text = 'OMPS-NPP'::text) AND 
(((starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND 
(starttime <= '2012-
05-07 07:41:27'::timestamp without time zone)) OR ((endtime >= 
'2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= 
'2012-05-07 07:41:27'::ti
mestamp without time zone))))
        ->  Index Scan using pk_filemeta_archiveset on 
filemeta_archiveset a  (cost=0.00..5.21 rows=1 width=4) (actual 
time=0.025..0.025 rows=0 loops=2)
              Index Cond: ((a.fileid = b.fileid) AND (a.archiveset = 
61000))
Total runtime: 200.102 ms
(8 rows)

explain analyse
select FileId as "FileId", ESDT as "ESDT",1 as "Position" from 
V_FileMeta_L3
where FileId in (select fileid from V_FileMeta_L3
          where Archiveset = 61000 and ESDT= 'ESDT123'
                    and Source = 'SOUCE1234'
                    and (
                       (StartTime between '2012-01-28 
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
                                     and '2012-01-28 
07:41:27.000000Z'::timestamp + '100 days'::interval)
                          or
                        (EndTime between '2012-01-28 
05:59:57.000000Z'::timestamp - '-135 minutes'::interval
                                    and '2012-01-28 
07:41:27.000000Z'::timestamp + '100 days'::interval)
                         )
                order by starttime)
limit 1;

Limit  (cost=53038.95..53039.41 rows=1 width=14) (actual 
time=2.502..2.502 rows=1 loops=1)
 ->  Nested Loop  (cost=53038.95..6661196.32 rows=14611464 width=14) 
(actual time=2.480..2.480 rows=1 loops=1)
       ->  Nested Loop  (cost=53038.95..53716.80 rows=2549631 width=18) 
(actual time=2.464..2.464 rows=1 loops=1)
             ->  HashAggregate  (cost=53038.95..53040.95 rows=200 
width=4) (actual time=2.445..2.445 rows=1 loops=1)
                   ->  Sort  (cost=52923.05..52942.37 rows=7727 
width=12) (actual time=2.346..2.372 rows=43 loops=1)
                         Sort Key: b.starttime
                         Sort Method:  quicksort  Memory: 27kB
                         ->  Nested Loop  (cost=209.87..52424.05 
rows=7727 width=12) (actual time=0.358..2.251 rows=43 loops=1)
                               ->  Bitmap Heap Scan on filemeta_l3 b  
(cost=209.87..12077.66 rows=7727 width=12) (actual time=0.262..0.685 
rows=108 loops=1)
                                     Recheck Cond: ((((esdt)::text = 
'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp 
without time zone) AND (starttime <= '2012-05-07 07:41:2
7'::timestamp without time zone)) OR (((esdt)::text = 'ROLPT'::text) AND 
(endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND 
(endtime <= '2012-05-07 07:41:27'::timestamp without
time zone)))
                                     Filter: ((source)::text = 
'OMPS-NPP'::text)
                                     ->  BitmapOr  (cost=209.87..209.87 
rows=9895 width=0) (actual time=0.223..0.223 rows=0 loops=1)
                                           ->  Bitmap Index Scan on 
ak_filemeta_l3_esdt_starttime_endtime  (cost=0.00..107.72 rows=4961 
width=0) (actual time=0.127..0.127 rows=106 loops=1)
                                                 Index Cond: 
(((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28 
08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-0
7 07:41:27'::timestamp without time zone))
                                           ->  Bitmap Index Scan on 
ak_filemeta_l3_esdt_endtime  (cost=0.00..98.29 rows=4934 width=0) 
(actual time=0.093..0.093 rows=108 loops=1)
                                                 Index Cond: 
(((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28 
08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07
:41:27'::timestamp without time zone))
                               ->  Index Scan using 
pk_filemeta_archiveset on filemeta_archiveset a  (cost=0.00..5.21 rows=1 
width=4) (actual time=0.011..0.011 rows=0 loops=108)
                                     Index Cond: ((a.fileid = b.fileid) 
AND (a.archiveset = 61000))
             ->  Index Scan using pk_filemeta_l3 on filemeta_l3 b  
(cost=0.00..3.37 rows=1 width=14) (actual time=0.015..0.015 rows=1 loops=1)
                   Index Cond: (b.fileid = a.fileid)
       ->  Index Scan using ak_filemeta_archiveset_fileid on 
filemeta_archiveset a  (cost=0.00..2.52 rows=6 width=4) (actual 
time=0.012..0.012 rows=1 loops=1)
             Index Cond: (a.fileid = b.fileid)
Total runtime: 2.711 ms

Hope this help.

Best Regards,
Alex Lai

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general