Hi all, I have a large database with e-mail meta-data (no bodies) for over 100 million messages. I am running PostgreSQL 8.2.4 on a server with 2GB of RAM (shared_buffers = 240MB, temp_buffers = 128MB, work_mem = 256MB, maintenance_work_mem = 256MB). I have the data split in two separate tables, "email" and "email_extras": Table "public.email" Column | Type | Modifiers -------------------+-----------------------------+----------- id | bigint | not null load_id | integer | not null ts | timestamp without time zone | not null ip | inet | not null mfrom | text | not null helo | text | Table "public.email_extras" Column | Type | Modifiers -------------------+-----------------------------+----------- id | bigint | not null ts | timestamp without time zone | not null size | integer | not null hdr_from | text | Each of these tables has been partitioned equally based on the "ts" (timestamp) field into two dozen or so tables, each covering one week of messages. For testing purposes, I have only one week's partition filled for each of the "email" and "email_extras" tables (email_2007_week34 {,extras}). Now if I perform the following simple join on the "email" and "email_ extras" tables ... SELECT count(*) FROM email INNER JOIN email_extras USING (id, ts) WHERE mfrom <> hdr_from; then I get the following horrendously inefficient plan: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=391396890.89..391396890.90 rows=1 width=0) -> Merge Join (cost=9338881.64..349156398.02 rows=16896197148 width=0) Merge Cond: ((public.email_extras.id = public.email.id) AND (public.email_extras.ts = public.email.ts)) Join Filter: (public.email.mfrom <> public.email_extras.hdr_from) -> Sort (cost=4592966.95..4658121.33 rows=26061752 width=48) Sort Key: public.email_extras.id, public.email_extras.ts -> Append (cost=0.00..491969.52 rows=26061752 width=48) -> Seq Scan on email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week13_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week14_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week15_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week16_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week17_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week18_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week19_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week20_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week21_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week22_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week23_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week24_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week25_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week26_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week27_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week28_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week29_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week30_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week31_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week32_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week33_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week34_extras email_extras (cost=0.00..491597.12 rows=26052512 width=33) -> Seq Scan on email_2007_week35_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week36_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week37_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week38_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week39_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Seq Scan on email_2007_week40_extras email_extras (cost=0.00..13.30 rows=330 width=48) -> Sort (cost=4745914.69..4811071.87 rows=26062872 width=48) Sort Key: public.email.id, public.email.ts -> Append (cost=0.00..644732.72 rows=26062872 width=48) -> Seq Scan on email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week13 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week14 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week15 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week16 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week17 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week18 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week19 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week20 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week21 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week22 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week23 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week24 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week25 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week26 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week27 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week28 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week29 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week30 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week31 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week32 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week33 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week34 email (cost=0.00..644349.12 rows=26052512 width=33) -> Seq Scan on email_2007_week35 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week36 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week37 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week38 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week39 email (cost=0.00..13.70 rows=370 width=48) -> Seq Scan on email_2007_week40 email (cost=0.00..13.70 rows=370 width=48) (68 rows) However, if I restrict the query to just the partitions that actually do have data in them ... SELECT count(*) FROM email_2007_week34 INNER JOIN email_2007_week34_extras USING (id, ts) WHERE mfrom <> hdr_from; then I get a much better plan that uses a hash join: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=4266338.94..4266338.95 rows=1 width=0) -> Hash Join (cost=1111362.80..4266336.07 rows=1145 width=0) Hash Cond: ((email_2007_week34.ts = email_2007_week34_extras.ts) AND (email_2007_week34.id = email_2007_week34_extras.id)) Join Filter: (email_2007_week34.mfrom <> email_2007_week34_extras.hdr_from) -> Seq Scan on email_2007_week34 (cost=0.00..644349.12 rows=26052512 width=33) -> Hash (cost=491597.12..491597.12 rows=26052512 width=33) -> Seq Scan on email_2007_week34_extras (cost=0.00..491597.12 rows=26052512 width=33) (7 rows) Yes, I have `ANALYZE`d the database before running the queries. How come the query planner gets thrown off that far by the simple table partitioning? What can I do to put the query planner back on the right track? Julian.
Attachment:
signature.asc
Description: This is a digitally signed message part.