Re: Strange (?) Index behavior?

Matt Clark <matt@xxxxxxxxxx> · Sat, 06 Nov 2004 00:14:15 +0000

So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.

I was talking about the cost, not the number of rows.  About 74,000
rows are returned but the query only takes about 8 seconds to run. --

Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an 
indexscan is better, and also no surprise that the planner can't know 
that I is such an uncommon initial char.

with the partial index in place.

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.

I don't see why not, it just uses the functional index to grap the 
1/(ascii_chars^2) of the rows that are of obvious interest, and then 
uses the standard index to filter that set..  Where it won't work is 
where you just want one initial char!  Which is why I suggested the 
silly query rewrite...

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index.  The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.

If you were planning the query, what would you do?  Assuming we're 
talking about A-Z as possible first chars, and assuming we don't know 
the distribution of those chars, then we have to assume 1/26 probability 
of each char, so a seq scan makes sense.  Whereas like 'JK%' should only 
pull 1/500 rows.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:

SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
-- or even --
SELECT field FROM table WHERE field LIKE 'A%';

The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.

Yes, because (I assume, Tom will no doubt clarify/correct), by creating 
the partial indices you create a lot more information about the 
distribution of the first char - either that, or the planner simply 
always uses an exactly matching partial index if available.

I _think_ that creating 26 partial indexes on '?%' is essentially the 
same thing as creating one functional index on substr(field,1,1), just 
messier, unless the partial indexes cause the planner to do something 
special...

M