Hello,
Have you tried increasing the statistics target for orderdate and
rerunning analyze?
Sincerely,
Joshua D. Drake
David Brown wrote:
I'm doing some performance profiling with a simple two-table query:
SELECT L."ProductID", sum(L."Amount")
FROM "drinv" H
JOIN "drinvln" L ON L."OrderNo" = H."OrderNo"
WHERE
("OrderDate" between '2003-01-01' AND '2003-04-30')
GROUP BY L."ProductID"
drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB. OrderNo is indexed in both tables, as is OrderDate.
The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking various conf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer doesn't seem to take resources into account like some of the doco suggests.
The date selection represents about 5% of the range. Here's the plan summaries:
Three months (2003-01-01 to 2003-03-30) = 1 second
HashAggregate (cost=119365.53..119368.74 rows=642 width=26)
-> Nested Loop (cost=0.00..118791.66 rows=114774 width=26)
-> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..200.27 rows=3142 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date))
-> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..28.73 rows=721 width=34)
Index Cond: (l."OrderNo" = "outer"."OrderNo")
Four months (2003-01-01 to 2003-04-30) = 60 seconds
HashAggregate (cost=126110.53..126113.74 rows=642 width=26)
-> Hash Join (cost=277.55..125344.88 rows=153130 width=26)
Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo")
-> Seq Scan on drinvln l (cost=0.00..106671.35 rows=3372935 width=34)
-> Hash (cost=267.07..267.07 rows=4192 width=8)
-> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))
Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds
HashAggregate (cost=130565.83..130569.04 rows=642 width=26)
-> Merge Join (cost=519.29..129800.18 rows=153130 width=26)
Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo")
-> Sort (cost=519.29..529.77 rows=4192 width=8)
Sort Key: h."OrderNo"
-> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))
-> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..119296.29 rows=3372935 width=34)
Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see, the estimated costs are wild anyway.
As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.
MySQL (InnoDB) took 2 seconds, which is 30 times faster.
The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong?
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@xxxxxxxxxxxxxx
--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@xxxxxxxxxxxxxxxxx - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:jd@xxxxxxxxxxxxxxxxx
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard