RE: Query very different speeds on seemingly similar data

Rob Northcott <Rob.Northcott@xxxxxxxxxxxxxx> · Wed, 5 Jun 2019 08:05:16 +0000

>-----Original Message-----
>From: David Rowley <david.rowley@xxxxxxxxxxxxxxx> 
>Sent: 05 June 2019 01:48
>To: Rob Northcott <Rob.Northcott@xxxxxxxxxxxxxx>
>Cc: pgsql-general@xxxxxxxxxxxxxxxxxxxx
>Subject: Re: Query very different speeds on seemingly similar data

>On Wed, 5 Jun 2019 at 04:55, Rob Northcott <Rob.Northcott@xxxxxxxxxxxxxx> wrote:
>> Explain files attached (assuming attachments will get through to the 
>> group – otherwise what’s the best way to post it?)

>You're best to post the EXPLAIN ANALYZE output to https://explain.depesz.com  what you attached is pretty horrible to read and details are only gained by hovering the >mouse cursor over the node. I for one struggle to work out what's the inner and outer sides of the join with the output you've given and can only work it out by looking >at the nested loop plan to see which side the parameter is on.

>It appears that the sub-query plan has changed from a Nested Loop plan to a Merge Join. The Merge Join is a pretty bad plan since the index that provides the pre-sorted input must filter out many non-matching rows. It's not quite clear to me why the planner chooses that index, mostly because I'm too lazy to learn the output you've shown the plans in, but if you did have an index on sales_invoicedetails (std_stk_key, std_unique), then the Merge Join plan would likely produce a better plan, or at least better than the current Merge Join plan.

You'll likely want to ensure that random_page_cost has not been set to something insane on the Merge Join instance.  I'd also check seq_page_cost too and also effective_cache_size.

More recommendations might be easier to give if you show the plans in a better format.

Apart from that, you could consider also instead of performing a sub-query, LEFT JOIN to a query similar to your subquery but after removing the std_stk_key = stock.stk_key condition and adding a GROUP BY std_stk_key. However, that may perform worse if there are many more std_stk_key groups than there are matching rows in stock.stk_key.
Experimentation might be required there.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Thanks for the reply David,

Those output files came from PGAdmin, but I agree they're not the easiest to look at.  I've tried writing the query in different ways (which is easy enough with the minimal query here but not so easy with the original massive query), but it's still way slower on one set of data than on the other.

I'll try changing those settings on the test server and see if it makes any difference... We've got almost 100 databases running the same system (on the same server) and this is the only one that's being a problem so far, but it could of course just be that we've been lucky.

If I can't get anywhere with that I'll try to post the analyse output in a better format.

Thanks again for your reply

Rob