Bad plan for join to aggregate of join.

Mischa Sandberg <mischa@xxxxxxxxxxxxx> · Tue, 12 Sep 2006 13:59:56 -0700

PG 8.0.3 is choosing a bad plan between a query.
I'm going to force the plan (by making one join into a function).

I'd like to know if this is unexpected; in general,
can PG see that a join on an grouped-by field
can be pushed down into the query as an indexable filter?

The query below joins a table "message", to an aggregate of 
"message_recipient" joined to "recipient". The joins are all on
indexed PK-FK columns. "message_recipient" is an intersect table.

          message :<: message_recipient :>: recipient

In the query plan below, the right side of the join returns one row of "message", 
and PG knows it.

The left side of the join compute the entire aggregate of "message_recipient"
(est 700K rows), then does a merge join against the single message row.

I would have hoped for a nested-loop join, where the message "id"
field would be used to index-scan "message_recipient",
which in turn would index-scan "recipient" by recipient "id".

This is PG 8.0.3. All tables have been (very) recently analyzed.
The query plans estimated rowcounts all look bang-on.
"message" and "message_recipient" are tables of about 3M rows each.

As usual, this is on a system to which I only have restricted access.
But I'd be happy to expand on the info below with anything short of
the pg_dump.

-----------------------------------========================================================

EXPLAIN
SELECT  message.id AS m_db_id, message.m_global_id AS id, m_global_id, m_queue_id, h_message_id,
       m_date AS c_date_iso, m_date, c_subject_utf8, message.reason_id AS reason_id,
       m_reason.name AS m_reason, m_spam_probability, m_spam_level, h_to, m_message_size,
       m_header_size, date_part('epoch', message.m_date) AS c_qdate_time,
       h_from_local || '@' || h_from_domain AS h_from,
       env_from_local || '@' || env_from_domain AS env_from,
       env_from_local || '@' || env_from_domain AS m_envelope_from, location_name AS location,
       m_milter_host, m_relay, virus_name AS m_virus_name, m_all_recipients
FROM message
JOIN m_reason ON message.reason_id = m_reason.reason_id
JOIN message_all_recipients ON message.id = message_all_recipients.m_id
WHERE message.m_global_id = '2211000-1';

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop  (cost=254538.42..283378.44 rows=1 width=425)
Join Filter: ("outer".reason_id = "inner".reason_id)
->  Merge Join  (cost=254538.42..283377.33 rows=1 width=416)
    Merge Cond: ("outer".m_id = "inner".id)
    ->  Subquery Scan message_all_recipients  (cost=254535.40..281604.95 rows=707735 width=40)
        ->  GroupAggregate  (cost=254535.40..274527.60 rows=707735 width=36)
            ->  Sort  (cost=254535.40..258250.57 rows=1486069 width=36)
                Sort Key: message_recipient.message_id
                ->  Merge Join  (cost=0.00..78970.52 rows=1486069 width=36)
                    Merge Cond: ("outer".id = "inner".recipient_id)
                    ->  Index Scan using pk_recipient on recipient  (cost=0.00..5150.65 rows=204514 width=36)
                    ->  Index Scan using pk_message_recipient on message_recipient  (cost=0.00..56818.25 rows=1486069 width=16)
                         Filter: (is_mapped = 1)
    ->  Sort  (cost=3.02..3.03 rows=1 width=384)
        Sort Key: message.id
        ->  Index Scan using unq_message_m_global_id on message  (cost=0.00..3.01 rows=1 width=384)
            Index Cond: ((m_global_id)::text = '2211000-1'::text)
->  Seq Scan on m_reason  (cost=0.00..1.04 rows=4 width=13)

----------------------------------- Relevant tables and view:

# \d message
                                          Table "public.message"
      Column       |            Type             |                        Modifiers
--------------------+-----------------------------+---------------------------------------------------------
id                 | bigint                      | not null default nextval('public.message_id_seq'::text)
m_global_id        | character varying(255)      | not null
reason_id          | smallint                    | not null
location_name      | character varying(255)      | not null
m_date             | timestamp without time zone |
m_queue_id         | character varying(255)      |
h_message_id       | character varying(255)      |
c_subject_utf8     | character varying(255)      |
env_from_local     | character varying(255)      |
env_from_domain    | character varying(255)      |
h_from_local       | character varying(255)      |
h_from_domain      | character varying(255)      |
h_from             | character varying(255)      |
h_to               | character varying(255)      |
m_milter_host      | character varying(255)      |
m_relay            | character varying(255)      |
m_spam_probability | double precision            |
m_message_size     | integer                     |
m_header_size      | integer                     |
m_spam_level       | character varying(255)      |
virus_name         | text                        |
Indexes:
   "pk_message" PRIMARY KEY, btree (id)
   "unq_message_m_global_id" UNIQUE, btree (m_global_id)
   "message_h_message_id_index" btree (h_message_id)
   "message_m_date_index" btree (m_date)
   "message_m_queue_id_index" btree (m_queue_id)

# \d message_recipient
      Table "public.message_recipient"
   Column     |   Type   |     Modifiers
---------------+----------+--------------------
recipient_id  | bigint   | not null
message_id    | bigint   | not null
is_mapped     | smallint | not null default 0
is_calculated | smallint | not null default 0
is_envelope   | smallint | not null default 0
reason_id     | smallint | not null
action        | smallint |
Indexes:
   "pk_message_recipient" PRIMARY KEY, btree (recipient_id, message_id)
   "message_recipient_message_id_index" btree (message_id)
Foreign-key constraints:
   "rc_rcpnt_map_msg_id" FOREIGN KEY (message_id) REFERENCES message(id) ON DELETE CASCADE

CREATE AGGREGATE catenate (
   BASETYPE = text,
   SFUNC = textcat,
   STYPE = text,
   INITCOND = ''
);

CREATE OR REPLACE VIEW message_all_recipients AS
SELECT  message_id AS m_id,
       substr(catenate(','||local||'@'||domain),2) AS m_all_recipients
FROM    message_recipient
JOIN    recipient ON id = recipient_id
WHERE   is_mapped = 1
GROUP BY message_id;

----------------------------------- pg_statistics info, problably not of much interest
Object                                   DiskIO    CacheIO      Ins     Upd      Del SeqScan      TupRead    IdxScan    IdxFetch
m_reason                                    308     599679        1       0        0  599985      2399935          0           0
message                                 4658766   14977816  2210967       0   933643    7299     81428503    5855900     8833404
message.pk_~                             227834   31683671        0       0        0       0      3897054    5850229     3897054
message.unq_~_m_global_id                252753    8591251        0       0        0       0         5552       5564        5552
message.~_h_~_id_index                  1879172    8496722        0       0        0       0            0          0           0
message.~_m_date_index                   245405    8526765        0       0        0       0      4930798        107     4930798
message.~_m_queue_id_index               245719    8598360        0       0        0       0            0          0           0
message_recipient                      41862572   81546465  2703260     104  1144977       0            0    2648101   117192003
message_recipient.pk_~                  4541776   16430539        0       0        0       0    116042206    1710555   116042206
message_recipient.~_message_id_index     243379   14235956        0       0        0       0      1149797     937546     1149797
recipient                              55288623  955926871   223057       0   112158  584592 103499192990    5726999    62036712
recipient.pk_~                           180080    1125073        0       0        0       0      7440446     117045     7440446
recipient.unq_~                         2205366   21513447        0       0        0       0     54166857    5609472    54166857
recipient.~_domain_index                 191722     734683        0       0        0       0       429409        482      429409

----------------------------------- output of "pgdisk", showing actual disk space vs pg_class info:
..DISK-KB ..DATA-KB ...EST-KB .EST-ROWS ...OID.... NAME
 1625360   1021104    979000   1315620      17261 public.message
  369208    159200    159032   1558240      17272 public.message_recipient
   45752     16408     14552    181646      17293 public.recipient