Re: Compound keys and foreign constraints

Manfred Koizar <mkoi-pg@aon.at> · Tue, 06 Apr 2004 14:59:33 +0200

On Mon, 05 Apr 2004 11:55:17 -0500, Wes Palmer
<Wesley.R.Palmer@syntegra.com> wrote:
>> match, there's no need to fetch the heap tuple.  Unfortunately the
>> planner doesn't believe that this is possible:
>> /* Don't believe estimates less than 1... */
>> if (tuples_fetched < 1.0)
>> tuples_fetched = 1.0;
>
>Unless you are fetching a majority of the rows, wouldn't it always be
>cheaper to consult the index instead of reading the full record?

In many (if not most) cases yes;  always no.  Think of a table having
100 rows per page.  To select 1% of the rows you might have to access
every row of the table -- at least theoretically in the worst case, but
you get the point.  It is the optimizer's job to find out which one of a
possibly large number of query plans is the best.  Unfortunately the
optimizer is not perfect (yet :-)).  This code snippet above might be a
spot where some improvement is possible.  If this really contributes to
your case, you have to live with if for now.

>>  ->  Nested Loop  (cost=0.00..75565.60 rows=1 width=0)
>>  ->  Nested Loop  (cost=0.00..73745.26 rows=11 width=0)

>In the above example, the first case is where both the compound and the
>single-column indexes existed.  I ran the test, deleted the single-column
>index, then ran the test again.

Yes, this is what I understood.  What I don't understand is why the
optimizer didn't choose the second plan in the first run.

>  I did not run vacuum or analyze in between.

Auto-vacuum?

>> more readable with \x):
>See enclosed tar file.  f1 is the first couple of commands.  f1a is with
>random_page_cost=4 and f1b is with random_page_cost=.8

	SELECT * FROM pg_stats WHERE tablename='messages' ?

What were the other settings (sort_mem, effective_cache_size)?

>You can see that I get the same plan (filter vs. index) even keeping the
>addresses table out of the picture.

Now that you join only two tables you could experiment with forcing
other join methods (SET enable_nestloop, enable_mergejoin,
enable_hashjoin).

>It occurs to me that de-normalizing it a bit and putting a duplicate
>message_date in the message_recipients may have been a good thing to do.
>The result set could have been obtained quicker.  But, I was trying to keep
>everything normalized as much as possible and wanted to keep the
>message_recipients as small as possible because it would be growing the
>fastest.  Trying to put that in now would be a bit of a challenge, since I'd
>have to update 70 million records based on the value in 20 million records.

Another de-normalization idea:

	ALTER TABLE messages ADD COLUMN mkey2 numeric(12,0);
	UPDATE messages SET mkey2 = message_key;

Do this in batches and run VACUUM between the UPDATEs.

	DROP INDEX messages_i_id_mdate;
	CREATE INDEX ... ON messages(mkey2, message_date);

	SELECT ...
	  FROM a INNER JOIN r ON (...)
	         INNER JOIN m ON (r.message_key = m.mkey2)
	 WHERE ...

I guess your hack leads to better speed, but mine is limited to only one
table which might be considered cleaner.

Servus
 Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org