On 05/12/2011 09:51 AM, Tom Lane wrote: > It does. I was a bit surprised that Shaun apparently got a plan that > didn't include a materialize step, because when I test a similar query > here, I get: Remember when I said "old version" that prevented us from using CTEs? We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0). It's basically calculating the group aggregation wrong, but is that enough to trigger it to go nuts? SELECT c.* FROM customer c JOIN (SELECT session_id, count(1) as counter FROM session WHERE date_created >= '2011-05-11 05:00' AND date_created < '2011-05-11 06:00' AND from_interface = 'UNKNOWN' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE - interval '1 day' AND c.row_out IS NULL; So sayeth the planner: Nested Loop (cost=167.49..2354.62 rows=6 width=237) (actual time=43.949..166858.604 rows=168 loops=1) -> GroupAggregate (cost=167.49..176.97 rows=2 width=8) (actual time=1.042..2.827 rows=209 loops=1) -> Sort (cost=167.49..170.64 rows=1260 width=8) (actual time=1.037..1.347 rows=230 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..102.61 rows=1260 width=8) (actual time=0.044. .0.690 rows=230 loops=1) Index Cond: ((date_created >= '11-MAY-11 05:00:00'::timestamp without time zone) AND (date_created < '11-MAY-11 06:00:00':: timestamp without time zone)) Filter: ((from_interface)::text = 'UNKNOWN'::text) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1088.78 rows=3 width=237) (actual time=19.820..798.348 rows=1 loops= 209) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) AND (row_out IS NULL)) Total runtime: 166859.040 ms That one hour extract is much, much slower than this: SELECT 1 FROM customer c JOIN (SELECT session_id, count(*) as counter FROM session WHERE date_created >= '2011-05-08' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE AND c.row_out IS NULL; Which gives this plan: Merge Join (cost=244565.52..246488.78 rows=377 width=0) (actual time=1958.781..2385.667 rows=22205 loops=1) Merge Cond: (a.session_id = c.customer_id) -> GroupAggregate (cost=19176.22..20275.99 rows=271 width=8) (actual time=1142.179..1459.779 rows=26643 loops=1) -> Sort (cost=19176.22..19541.68 rows=146184 width=8) (actual time=1142.152..1374.328 rows=179006 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..6635.51 rows=146184 width=8) (actual time=0.0 20..160.339 rows=179267 loops=1) Index Cond: (date_created >= '08-MAY-11 00:00:00'::timestamp without time zone) -> Sort (cost=225389.30..225797.47 rows=163267 width=8) (actual time=816.585..855.459 rows=155067 loops=1) Sort Key: c.customer_id -> Index Scan using idx_customer_rpt on customer c (cost=0.00..211252.93 rows=163267 width=8) (actual time=0.037..90.337 rows=155067 loops=1) Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp without time zone) Filter: (row_out IS NULL) But make the inner query slightly smaller, and... Nested Loop (cost=13755.53..223453.98 rows=276 width=0) -> GroupAggregate (cost=13755.53..14558.26 rows=198 width=8) -> Sort (cost=13755.53..14022.28 rows=106700 width=8) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..4844.37 rows=106700 width=8) Index Cond: (date_created >= '09-MAY-11 00:00:00'::timestamp without time zone) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1055.01 rows=1 width=8) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time zone) AND (row_out IS NULL)) I didn't want to wait two hours for that to finish. ;) But the stats are all pretty darn close, so far as I can tell. The only thing that's off is the group aggregate... by about two orders of magnitude. So I just chalked it up to 8.2 being relatively horrible, and punted to just using a temp table to trick the optimizer into doing it right. But my greater point was that even doing it all in SQL doesn't always work, which we all know. Use of EXPLAIN abounds, but that doesn't necessarily mean a dev will know how to fix a bad plan. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@xxxxxxxxx ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance