Search Postgresql Archives

Re: Two slightly different queries produce same results,

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Schmidt wrote:
However, I've run into a problem where one query took about twice as long as innodb. Some investigation and playing around with the query, I ended up with a slightly different query but was about 3 times as fast as innodb (and 5 times faster than the original query). I didn't add any indices and the EXPLAIN ANALYZE doesn't show any new indices being used with the new query. The order of which tables are read first was changed in the query planner and also it uses HashAggregate vs GroupAggregate.. not entirely sure what that means.

It's a different method of aggregating data (calculating your sum()). The HashAggregate puts values into buckets and works best when you have a fairly small number of different values (so each value can get its own bucket). I think - bear in mind I'm just another user, so I'm not familiar with the internals.

The different plans are because PG thinks it will get different numbers of rows...

SELECT tp.team_id, pl.position = 'G' AS is_goalie, SUM(goals) AS total_goals
FROM
team t JOIN team_players tp ON tp.team_id = t.team_id
JOIN player_updates pu ON pu.player_id = tp.player_id
JOIN player pl ON pl.player_id = pu.player_id
WHERE t.pool_id = 21699
GROUP BY tp.team_id,is_goalie;

Explain analyze of this query:

-> Index Scan using team_pool_id on team t (cost=0.00..397.90 rows=391 width=4) (actual time=0.061..2.129 rows=313 loops=1)
                                Index Cond: (pool_id = 21699)

With this index scan you have one condition and PG expects 391 matching rows (it gets 313, so that's a pretty good estimate).

SELECT tp.team_id,pl.position = 'G' AS is_goalie,SUM(goals) AS total_goals
FROM
team t JOIN team_players tp ON tp.team_id = t.team_id AND t.pool_id = 21699 <--- ** added pool_id
JOIN player_updates pu ON pu.player_id = tp.player_id
JOIN player pl ON pl.player_id = pu.player_id
WHERE t.pool_id = 21699
GROUP BY tp.team_id,is_goalie;

The explain analyze:
-> Index Scan using team_pool_id on team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.049..1.556 rows=313 loops=1) Index Cond: ((pool_id = 21699) AND (pool_id = 21699))

Here it sees two conditions, and PG is expecting that will restrict things further. It's clearly not expecting the same condition twice, but is presumably treating it something like "a>-4 AND a<4". This actually makes the row estimate wrong.

So - if PG is getting this bit wrong why is query 2 faster?
Looking at the row estimates for case 1 we can see that the final cases are badly wrong.

> GroupAggregate  (cost=8742.52..9853.85 rows=44453 width=13) (actual
> time=1186.973..1432.548 rows=626 loops=1)

Here it's got the number of rows badly wrong - if you actually ended up with 44,000 rows then maybe this would be the best plan. This is the last step though, so what happens before?

>   ->  Sort  (cost=8742.52..8853.66 rows=44453 width=13) (actual
> time=1186.237..1309.562 rows=40928 loops=1)
>         Sort Key: tp.team_id, (pl."position" = 'G'::bpchar)

We're about right here. The problem must be in the last step, and looking at the row estimates it seems to think that no summarising is actually going to take place.

I think the "position = 'G'" bit is confusing it. Well, we can test that:

1. Just group by "pl.position" - remove the test. Do the estimated and actual rows match now?

2. Rewrite the position='G' as a CASE structure

3. Do the position='G' part in a sub-query and join to that.

The results of those should show whether this is actually the problem.
--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux