Re: Strange (?) Index behavior?

"Joshua D. Drake" <jd@xxxxxxxxxxxxxxxxx> · Thu, 11 Nov 2004 13:48:27 -0800

-----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using sname_unique on "testtable"  (cost=0.00..34453.74
rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
loops=1)
  Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text))
  Filter: (sname ~~ 'AA%'::text)
Total runtime: 537477.737 ms
(4 rows)

Time: 537480.571 ms

Nothing you're going to do to the query is going to come up with a more
effective plan than this. It's using the index after all. It's never going to
be lightning fast because it has to process 75k rows.

However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
about 10 seconds.

I am confused about this statement. I have a table with 1.77 million 
rows that I use gist indexes on (TSearch) and I can pull out of it in 
less than 2 seconds.

Are you saying it should be taking 10 seconds because of the type of 
plan? 10 seconds seems like an awfullong time for this.

Sincerely,

Joshua D. Drake

That's my feeling as well, I thought the index was to blame because it
will be quite large, possibly large enough to not fit in memory nor be
quickly bursted up.

The 77ms before finding the first record is a bit suspicious. Have you
vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
results. You might try to REINDEX it as well, though I doubt that would help.

This table is *brand spanking new* for lack of a better term.  I have
the data for it in a CSV.  I load the CSV up which takes a bit, then
create the indexes, do a vacuum analyze verbose, and then posted the
results above.  I don't think running vacuum a more times is going to
change things, at least not without tweaking config settings that
affect vacuum. Not a single row has been inserted or altered since the
initial load.. it's just a test.

I can't give vacuum stats right now because the thing is reloading
(again) with different newfs settings -- something I figure I have the
time to fiddle with now, and seldom do at other times.  These numbers
though don't change much between 8K on up to 64K 'cluster' sizes.  I'm
trying it now with 8K page sizes, with 8K "minimum fragment" sizes. 
Should speed things up a tiny bit but not enough to really affect this
query.

Do you still see a need to have the output from the vacuum?

Actually you might consider clustering the table on sname_unique. That would
accomplish the same thing as the VACUUM FULL command and also speed up the
index scan. And the optimizer knows (if you analyze afterwards) it so it
should be more likely to pick the index scan. But currently you have to rerun
cluster periodically.

Clustering is really unworkable in this situation.  It would work now,
in this limited test case, but using it if this were to go into
production is unrealistic.  It would have to happen fairly often since
this table is updated frequently, which will break the clustering
quickly with MVCC.

Running it often.. well.. it has 70M+ rows, and the entire table is
copied, reordered, and rewritten.. so that's a lot of 'scratch space'
needed.  Finally, clustering locks the table..

Something I'd already considered but quickly ruled out because of
these reasons..

More ideas are welcome though. ;)

-Allen

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@xxxxxxxxxxxxxx

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@xxxxxxxxxxxxxxxxx - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:jd@xxxxxxxxxxxxxxxxx
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard