Re: postgresql vs mysql

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Wed, 21 Feb 2007 11:19:16 -0600

On Wed, 2007-02-21 at 10:54, Adam Rich wrote:
> > It's got a query parser that's dumb as a brick. 
> 
> While we're on this topic...  I have a question on these series
> of queries:
> 
> -- Query A
> select count(*) from customers c
> where not exists ( select 1 from orders o
> where o.customer_id = c.customer_id )
> 
> -- Query B
> select count(*) from customers c
> where customer_id not in ( select customer_id from orders)
> 
> -- Query C
> select count(*) from customers c
> left join orders o on c.customer_id = o.customer_id
> where o.order_id is null
> 
> 
> I believe they all achieve the same thing.  i.e. How many
> customers have never placed an order?  I ran these 3 on
> MySQL & PG with the following results:
> 
> Query A:  MySQL=4.74s  PostgreSQL=4.23s
> Query B:  MySQL=4.64s  PostgreSQL=?????
> Query C:  MySQL=5.07s  PostgreSQL=3.39s
> 
> MySQL's time is pretty consistent for all 3.  As you said,
> the output from explain is pretty useless so there's not
> much else to look at.
> 
> PostgreSQL runs A&C slightly faster, which I expected.
> However, waiting for query B exceeded my patience and 
> I had to abort it.  The explain output is below, is this
> result due to some incorrect setting?

Nope, more like incorrect usage / inability to optimize by postgresql
due to architecture.  The B query (like the B arc) is a bad choice here
because PostgreSQL has to actually create a giant OR list of all the
customer_ids from order.  

But the queries I was referring to were more along the lines of multiple
level subselect queries with lots of aggregation on the outside, the
kind used for business intelligence reporting.

There might be some optimization trick for the B query I'm not familiar
with (cause every time I turn around, Tom has gone and made the query
optimizer smarter) but I haven't heard of it.