Re: Question about (probably wrong) index scan cost for conditional indexes

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 29 Jan 2012 21:11:02 -0500

Maxim Boguk <maxim.boguk@xxxxxxxxx> writes:
> I know there is issue with statistics over intarrays (it was there
> very long time and sometime it's complicating things a lot).

> However,  the 100x cost difference between:
> SELECT * from test order by id limit 100;  (over "primary key (id)" btree index)
> Limit  (cost=0.00..3.43 rows=100 width=37)
> vs
> SELECT * from test where sections && '{2}' order by value limit 100;
> (over "test_value_in2section_key on test(value) where sections &&
> '{2}'"   btree index)
> Limit  (cost=0.00..539.29 rows=100 width=37)
> seems wrong for me.

[ shrug... ]  It really is the fault of the bad rowcount estimate.
The cost estimates for the complete indexscans are reasonable.  However,
in one case you've got LIMIT thinking that it will fetch the first 100
out of 1000000 index entries, so it divides the indexscan cost estimate
by 10000, and gets something reasonably accurate.  In the other case,
LIMIT thinks it's going to fetch the first 100 out of 1000 index
entries, so it divides the indexscan cost estimate by 10, and comes out
with something not very accurate.  If the rowcount estimate for && had
been correct, those numbers would be much more alike.

> Both queries performs the absolutely same task: fetch 100 entries from
> the table based on the ideally suitable index (no post
> processing/filtering were done at all... just return 100 sorted tuples
> based on single index scan).

Well, you know that and I know that, but the exposed cost and rowcount
estimates for the IndexScan plan node imply something else entirely:
that the cost-per-tuple-fetched is a lot higher in the one case than
the other.  The LIMIT estimator has no reason, or method, to second
guess that.

> And even if I drop the intarray index completely, than I still have a
> wrong plan (bitmap scan + sort),  because planner cost for the index
> scan over conditional index 100 more the it should be.
> (e.g. there is still an issue even in absence of the intarray index).

Yeah, because it's not about the index, it's about the selectivity of
the && operator.  That estimate is wrong regardless of whether there
are any indexes involved.

> Is absence of frequency statistics over intarrays somehow linked to
> the wrong planner cost estimates for conditional index scan?

Well, we lack both the statistics and an operator selectivity function
that would know what to do with them.  Just a small matter of
programming ...

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general