Performance of outer joins?

Benjamin Smith <ben@xxxxxxxxxxxxxxxxxx> · Fri, 15 Dec 2006 15:59:12 -0800

I have a situation that can be summarized to the following: 

-- day in 20061215 format 
Create table calendar (
	day integer unique not null
	); 

Create table customers (
	id serial unique not null, 
	name varchar, 
	address varchar, 
	); 

Create table deliveries (
	customers_id integer not null references customers(id), 
	calendar_day integer not null references calendar(day), 
	delivered bool not null default false, 
	unique(customers_id, calendar_id)
	); 

Imagine tens of thousands of customers, a few million deliveries. A query 
that's structurally similar to the following query is rather slow. It's 
taking over 30 seconds, climbing fast on reasonable HW. (SMP Opteron, 10k 
SCSI, 4 GB RAM) If I remove the outer join, performance is < 1 second. 

SELECT customers.id as customers_id, 
	customers.name AS customers_name, 
	calendar.day AS calendar_day, 
	CASE WHEN (deliveries.delivered IS NULL) THEN 'n/a' 
		WHEN (deliveries.delivered=TRUE) THEN 'yes'
		ELSE 'no' END AS delivered
	FROM customers 
	JOIN calendars ON 
		(
		-- GIVE A CALENDAR OF POSSIBLE DAYS FOR DELIVERIES 
		calendar.day < 20061201
		AND calendar.day >= 20060101
		) 
	LEFT OUTER JOIN deliveries ON 
		( 
		customers.id=deliveries.customers_id 
		AND deliveries.calendar_day=calendar.day 
		) 
	; 

What can I do to improve the performance of this oft-used query? Is there a 
better way to do this, or am I doomed to looping thru results and parsing the 
results in code? 

Thanks, 

-Ben 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous