Re: Strange (?) Index behavior?

Allen Landsidel <alandsidel@xxxxxxxxx> · Fri, 5 Nov 2004 17:40:23 -0500

On Fri, 05 Nov 2004 16:08:56 -0500, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> Allen Landsidel <alandsidel@xxxxxxxxx> writes:
> > With seqscan enabled however, "AB%" will use the index, but "A%" will not.
>
> > The estimated cost for the query is much higher without the partial
> > indexes than it is with them, and the actual runtime of the query is
> > definitely longer without the partial indexes.
>
> OK.  This suggests that the planner is drastically misestimating
> the selectivity of the 'A%' clause, which seems odd to me since in
> principle it could get that fairly well from the ANALYZE histogram.
> But it could well be that you need to increase the resolution of the
> histogram --- see ALTER TABLE SET STATISTICS.

I will look into this.

>
> Did you ever show us EXPLAIN ANALYZE results for this query?

No, I didn't.  I am running it now without the partial index on to
give you the results but it's (the 'A%' problem query) been running
pretty much since I got this message (an hour ago) and is still not
finished.

The EXPLAIN results without the ANALYZE will have to suffice until
it's done, I can readd the index, and run it again, so you have both
to compare to.

First two queries run where both the main index, and the 'A%' index exist:

-- QUERY 1
search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%';
                                       QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using test_name_idx_a on "test"  (cost=0.00..8605.88
rows=391208 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text))
  Filter: (test_name ~~ 'A%'::text)
(3 rows)

Time: 16.507 ms

-- QUERY 2
search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%' AND
test_name LIKE 'AB%';
                                                              QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_name_idx_a on "test"  (cost=0.00..113.79
rows=28 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text)
AND (test_name >= 'AB'::text) AND (test_name < 'AC'::text))
  Filter: ((test_name ~~ 'A%'::text) AND (test_name ~~ 'AB%'::text))
(3 rows)

Time: 3.197 ms

Ok, now the same two queries after a DROP INDEX test_name_idx_a;

search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%';
                                         QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using test_name_unique on "test"  (cost=0.00..1568918.66
rows=391208 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text))
  Filter: (test_name ~~ 'A%'::text)
(3 rows)

Time: 2.470 ms

search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'AB%';
                                       QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using test_name_unique on "test"  (cost=0.00..20379.49
rows=5081 width=20)
  Index Cond: ((test_name >= 'AB'::text) AND (test_name < 'AC'::text))
  Filter: (test_name ~~ 'AB%'::text)
(3 rows)

Time: 2.489 ms

------------------
Copying just the costs you can see the vast difference...
Index Scan using test_name_unique on "test"  (cost=0.00..1568918.66
rows=391208 width=20)
Index Scan using test_name_unique on "test"  (cost=0.00..20379.49
rows=5081 width=20)

vs

Index Scan using test_name_idx_a on "test"  (cost=0.00..8605.88
rows=391208 width=20)
Index Scan using test_name_idx_a on "test"  (cost=0.00..113.79
rows=28 width=20)

Lastly no, neither of these row guesstimates is correct..  I'll get
back and tell you how much they're off by if it's important, once this
query is done.

The odd thing is it used the index scan here each time -- that has not
always been the case with the main unique index, it's trying to make a
liar out of me heh.

I'm used to the estimates and plan changing from one vacuum analyze to
the next, even without any inserts or updates between.. the index scan
is always used however when I have the partial indexes in place, and
something like..

CREATE TEMP TABLE t1 AS
 SELECT field FROM table
 WHERE field LIKE 'A%'
 AND field LIKE 'AA%';

runs in 6-8 seconds as well, with a bit under 100k records.

-Allen