Re: [PERFORM] Inaccurate Explain Cost

Edson Richter <edsonrichter@xxxxxxxxxxx> · Wed, 26 Sep 2012 17:20:01 -0300

Em 26/09/2012 17:03, Shaun Thomas escreveu:
On 09/26/2012 01:38 PM, Robert Sosinski wrote:

I seem to be getting an inaccurate cost from explain.  Here are two
examples for one query with two different query plans:
Well, there's this:

Nested Loop  (cost=0.00..151986.53 rows=2817 width=4) (actual 
time=163.275..186869.844 rows=43904 loops=1)
If anything's a smoking gun, that is. I could see why you'd want to 
turn off nested loops to get better execution time. But the question 
is: why did it think it would match so few rows in the first place? 
The planner probably would have thrown away this query plan had it 
known it would loop 20x more than it thought.
I think we need to know what your default_statistics_target is set at, 
and really... all of your relevant postgresql settings.
Please see this:

http://wiki.postgresql.org/wiki/Slow_Query_Questions

But you also may need to look a lot more into your query itself. The 
difference between a 2 or a 3 minute query isn't going to help you 
much. Over here, we tend to spend more of our time turning 2 or 3 
minute queries into 20 or 30ms queries. But judging by your date 
range, getting the last 2-months of data from a table that large 
generally won't be fast by any means.
That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
  FROM exchange_uploads eu
  JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
  LEFT JOIN uploads u ON u.id = eu.upload_id
  LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
  LEFT JOIN exchanges e ON e.id = ud.exchange_id
 WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
   AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used 
in the query results or where criteria. You could just use this:
Interesting. I've similar situation, where user can choose a set of 
filters, and then the query must have several left joins "just in case" 
(user need in the filer).
I know other database that is able to remove unnecessary outer joins 
from queries when they are not relevant and for instance become faster.
Can't PostgreSQL do the same?

Regards,

Edson.

SELECT COUNT(DISTINCT eu.id)
  FROM exchange_uploads eu
  JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
 WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
   AND ud.office_id = 6;

Though I presume this is just a count precursor to a query that 
fetches the actul results and does need the left join. Either way, the 
index scan from your second example matches 3.3M rows by using the 
created_at index on exchange_uploads. That's not really very 
restrictive, and so you have two problems:
1. Your nested loop stats from office_id are somehow wrong. Try 
increasing your stats on that column, or just 
default_statistics_target in general, and re-analyze.
2. Your created_at criteria above match way too many rows, and will 
also take a long time to process.
Those are your two actual problems. We can probably get your query to 
run faster, but those are pretty significant hurdles.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general