Re: Strange (?) Index behavior?

Allen Landsidel <alandsidel@xxxxxxxxx> · Thu, 11 Nov 2004 16:12:51 -0500

On 11 Nov 2004 15:49:46 -0500, Greg Stark <gsstark@xxxxxxx> wrote:
> Allen Landsidel <alandsidel@xxxxxxxxx> writes:
> 
> 
> 
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> >  Index Scan using sname_unique on "testtable"  (cost=0.00..34453.74
> > rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
> > loops=1)
> >    Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text))
> >    Filter: (sname ~~ 'AA%'::text)
> >  Total runtime: 537477.737 ms
> > (4 rows)
> >
> > Time: 537480.571 ms
> 
> Nothing you're going to do to the query is going to come up with a more
> effective plan than this. It's using the index after all. It's never going to
> be lightning fast because it has to process 75k rows.
> 
> However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
> about 10 seconds.

That's my feeling as well, I thought the index was to blame because it
will be quite large, possibly large enough to not fit in memory nor be
quickly bursted up.

> The 77ms before finding the first record is a bit suspicious. Have you
> vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
> results. You might try to REINDEX it as well, though I doubt that would help.

This table is *brand spanking new* for lack of a better term.  I have
the data for it in a CSV.  I load the CSV up which takes a bit, then
create the indexes, do a vacuum analyze verbose, and then posted the
results above.  I don't think running vacuum a more times is going to
change things, at least not without tweaking config settings that
affect vacuum. Not a single row has been inserted or altered since the
initial load.. it's just a test.

I can't give vacuum stats right now because the thing is reloading
(again) with different newfs settings -- something I figure I have the
time to fiddle with now, and seldom do at other times.  These numbers
though don't change much between 8K on up to 64K 'cluster' sizes.  I'm
trying it now with 8K page sizes, with 8K "minimum fragment" sizes. 
Should speed things up a tiny bit but not enough to really affect this
query.

Do you still see a need to have the output from the vacuum?

> Actually you might consider clustering the table on sname_unique. That would
> accomplish the same thing as the VACUUM FULL command and also speed up the
> index scan. And the optimizer knows (if you analyze afterwards) it so it
> should be more likely to pick the index scan. But currently you have to rerun
> cluster periodically.

Clustering is really unworkable in this situation.  It would work now,
in this limited test case, but using it if this were to go into
production is unrealistic.  It would have to happen fairly often since
this table is updated frequently, which will break the clustering
quickly with MVCC.

Running it often.. well.. it has 70M+ rows, and the entire table is
copied, reordered, and rewritten.. so that's a lot of 'scratch space'
needed.  Finally, clustering locks the table..

Something I'd already considered but quickly ruled out because of
these reasons..

More ideas are welcome though. ;)

-Allen