On Wed, 2007-02-21 at 10:54, Adam Rich wrote: > > It's got a query parser that's dumb as a brick. > > While we're on this topic... I have a question on these series > of queries: > > -- Query A > select count(*) from customers c > where not exists ( select 1 from orders o > where o.customer_id = c.customer_id ) > > -- Query B > select count(*) from customers c > where customer_id not in ( select customer_id from orders) > > -- Query C > select count(*) from customers c > left join orders o on c.customer_id = o.customer_id > where o.order_id is null > > > I believe they all achieve the same thing. i.e. How many > customers have never placed an order? I ran these 3 on > MySQL & PG with the following results: > > Query A: MySQL=4.74s PostgreSQL=4.23s > Query B: MySQL=4.64s PostgreSQL=????? > Query C: MySQL=5.07s PostgreSQL=3.39s > > MySQL's time is pretty consistent for all 3. As you said, > the output from explain is pretty useless so there's not > much else to look at. > > PostgreSQL runs A&C slightly faster, which I expected. > However, waiting for query B exceeded my patience and > I had to abort it. The explain output is below, is this > result due to some incorrect setting? Nope, more like incorrect usage / inability to optimize by postgresql due to architecture. The B query (like the B arc) is a bad choice here because PostgreSQL has to actually create a giant OR list of all the customer_ids from order. But the queries I was referring to were more along the lines of multiple level subselect queries with lots of aggregation on the outside, the kind used for business intelligence reporting. There might be some optimization trick for the B query I'm not familiar with (cause every time I turn around, Tom has gone and made the query optimizer smarter) but I haven't heard of it.