On Mon, 05 Apr 2004 11:55:17 -0500, Wes Palmer <Wesley.R.Palmer@syntegra.com> wrote: >> match, there's no need to fetch the heap tuple. Unfortunately the >> planner doesn't believe that this is possible: >> /* Don't believe estimates less than 1... */ >> if (tuples_fetched < 1.0) >> tuples_fetched = 1.0; > >Unless you are fetching a majority of the rows, wouldn't it always be >cheaper to consult the index instead of reading the full record? In many (if not most) cases yes; always no. Think of a table having 100 rows per page. To select 1% of the rows you might have to access every row of the table -- at least theoretically in the worst case, but you get the point. It is the optimizer's job to find out which one of a possibly large number of query plans is the best. Unfortunately the optimizer is not perfect (yet :-)). This code snippet above might be a spot where some improvement is possible. If this really contributes to your case, you have to live with if for now. >> -> Nested Loop (cost=0.00..75565.60 rows=1 width=0) >> -> Nested Loop (cost=0.00..73745.26 rows=11 width=0) >In the above example, the first case is where both the compound and the >single-column indexes existed. I ran the test, deleted the single-column >index, then ran the test again. Yes, this is what I understood. What I don't understand is why the optimizer didn't choose the second plan in the first run. > I did not run vacuum or analyze in between. Auto-vacuum? >> more readable with \x): >See enclosed tar file. f1 is the first couple of commands. f1a is with >random_page_cost=4 and f1b is with random_page_cost=.8 SELECT * FROM pg_stats WHERE tablename='messages' ? What were the other settings (sort_mem, effective_cache_size)? >You can see that I get the same plan (filter vs. index) even keeping the >addresses table out of the picture. Now that you join only two tables you could experiment with forcing other join methods (SET enable_nestloop, enable_mergejoin, enable_hashjoin). >It occurs to me that de-normalizing it a bit and putting a duplicate >message_date in the message_recipients may have been a good thing to do. >The result set could have been obtained quicker. But, I was trying to keep >everything normalized as much as possible and wanted to keep the >message_recipients as small as possible because it would be growing the >fastest. Trying to put that in now would be a bit of a challenge, since I'd >have to update 70 million records based on the value in 20 million records. Another de-normalization idea: ALTER TABLE messages ADD COLUMN mkey2 numeric(12,0); UPDATE messages SET mkey2 = message_key; Do this in batches and run VACUUM between the UPDATEs. DROP INDEX messages_i_id_mdate; CREATE INDEX ... ON messages(mkey2, message_date); SELECT ... FROM a INNER JOIN r ON (...) INNER JOIN m ON (r.message_key = m.mkey2) WHERE ... I guess your hack leads to better speed, but mine is limited to only one table which might be considered cleaner. Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org