Hi Richard,
While in the process of responding to your email I found out what was
doing it. It was the grouping of the team_players team_id instead of
the team team_id.
Here's some table numbers:
team => 31720 rows,
team_players => 464896 rows,
player => 948 rows
player_updates => 5414 rows
The query:
SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals
FROM
team t JOIN team_players tp ON tp.team_id = t.team_id
JOIN player_updates pu ON pu.player_id = tp.player_id
JOIN player pl ON pl.player_id = pu.player_id
WHERE t.pool_id = 21699
GROUP BY tp.team_id;
It's grouping by the team_players tp.team_id, change the group to
t.team_id and bam! 5x faster.
thanks for the help,
-- Andrew
*******
This was my response to you Richard before I found out what it was.
*******
I removed position from the query, but still got the same query times.
I summed up the pl.player_id just to make sure it was reading the player
table.
EXPLAIN ANALYSE
SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals
FROM
team t JOIN team_players tp ON tp.team_id = t.team_id
JOIN player_updates pu ON pu.player_id = tp.player_id
JOIN player pl ON pl.player_id = pu.player_id
WHERE t.pool_id = 21699
GROUP BY tp.team_id;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=8584.45..9364.83 rows=22390 width=12) (actual
time=894.364..1140.884 rows=313 loops=1)
-> Sort (cost=8584.45..8695.59 rows=44453 width=12) (actual
time=893.373..1013.989 rows=40928 loops=1)
Sort Key: tp.team_id
-> Hash Join (cost=174.52..4701.14 rows=44453 width=12)
(actual time=50.433..328.788 rows=40928 loops=1)
Hash Cond: ("outer".player_id = "inner".player_id)
-> Hash Join (cost=30.85..3840.72 rows=5731 width=12)
(actual time=7.279..110.578 rows=5005 loops=1)
Hash Cond: ("outer".player_id = "inner".player_id)
-> Nested Loop (cost=0.00..3723.90 rows=5731
width=8) (actual time=0.106..69.528 rows=5005 loops=1)
-> Index Scan using team_pool_id on team t
(cost=0.00..397.90 rows=391 width=4) (actual time=0.058..2.313 rows=313
loops=1)
Index Cond: (pool_id = 21699)
-> Index Scan using team_id_asdas on
team_players tp (cost=0.00..8.24 rows=21 width=8) (actual
time=0.027..0.122 rows=16 loops=313)
Index Cond: (tp.team_id = "outer".team_id)
-> Hash (cost=28.48..28.48 rows=948 width=4)
(actual time=7.121..7.121 rows=948 loops=1)
-> Seq Scan on player pl (cost=0.00..28.48
rows=948 width=4) (actual time=0.042..3.930 rows=948 loops=1)
-> Hash (cost=130.14..130.14 rows=5414 width=8) (actual
time=43.070..43.070 rows=5414 loops=1)
-> Seq Scan on player_updates pu
(cost=0.00..130.14 rows=5414 width=8) (actual time=0.013..22.410
rows=5414 loops=1)
Total runtime: 1144.059 ms
(17 rows)
---
VS fast query
EXPLAIN ANALYSE
SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals
FROM
team t JOIN team_players tp ON tp.team_id = t.team_id AND t.pool_id = 21699
JOIN player_updates pu ON pu.player_id = tp.player_id
JOIN player pl ON pl.player_id = pu.player_id
WHERE t.pool_id = 21699
GROUP BY tp.team_id;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=270.31..278.80 rows=566 width=12) (actual
time=409.428..410.551 rows=313 loops=1)
-> Hash Join (cost=89.66..266.06 rows=566 width=12) (actual
time=120.341..278.310 rows=40928 loops=1)
Hash Cond: ("outer".player_id = "inner".player_id)
-> Seq Scan on player_updates pu (cost=0.00..130.14 rows=5414
width=8) (actual time=0.019..16.833 rows=5414 loops=1)
-> Hash (cost=89.48..89.48 rows=73 width=12) (actual
time=120.209..120.209 rows=5005 loops=1)
-> Hash Join (cost=53.16..89.48 rows=73 width=12)
(actual time=78.672..102.434 rows=5005 loops=1)
Hash Cond: ("outer".player_id = "inner".player_id)
-> Seq Scan on player pl (cost=0.00..28.48
rows=948 width=4) (actual time=0.040..3.092 rows=948 loops=1)
-> Hash (cost=52.97..52.97 rows=73 width=8)
(actual time=78.577..78.577 rows=5005 loops=1)
-> Nested Loop (cost=0.00..52.97 rows=73
width=8) (actual time=0.100..57.729 rows=5005 loops=1)
-> Index Scan using team_pool_id on
team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.050..1.591
rows=313 loops=1)
Index Cond: ((pool_id = 21699)
AND (pool_id = 21699))
-> Index Scan using team_id_asdas on
team_players tp (cost=0.00..8.24 rows=21 width=8) (actual
time=0.021..0.099 rows=16 loops=313)
Index Cond: (tp.team_id =
"outer".team_id)
Total runtime: 411.629 ms
(15 rows)
Time: 417.726 ms
I ran ANALYZE on each of the tables, just to be sure, but there was no
changes.
Also, here's some table info:
team => 31720 rows,
team_players => 464896 rows,
player => 948 rows
player_updates => 5414 rows
thanks,
-- Andrew
The explain analyze:
-> Index Scan using team_pool_id on
team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.049..1.556
rows=313 loops=1)
Index Cond: ((pool_id = 21699)
AND (pool_id = 21699))
Here it sees two conditions, and PG is expecting that will restrict
things further. It's clearly not expecting the same condition twice,
but is presumably treating it something like "a>-4 AND a<4". This
actually makes the row estimate wrong.
So - if PG is getting this bit wrong why is query 2 faster?
Looking at the row estimates for case 1 we can see that the final
cases are badly wrong.
> GroupAggregate (cost=8742.52..9853.85 rows=44453 width=13) (actual
> time=1186.973..1432.548 rows=626 loops=1)
Here it's got the number of rows badly wrong - if you actually ended
up with 44,000 rows then maybe this would be the best plan. This is
the last step though, so what happens before?
> -> Sort (cost=8742.52..8853.66 rows=44453 width=13) (actual
> time=1186.237..1309.562 rows=40928 loops=1)
> Sort Key: tp.team_id, (pl."position" = 'G'::bpchar)
We're about right here. The problem must be in the last step, and
looking at the row estimates it seems to think that no summarising is
actually going to take place.
I think the "position = 'G'" bit is confusing it. Well, we can test that:
1. Just group by "pl.position" - remove the test. Do the estimated and
actual rows match now?
2. Rewrite the position='G' as a CASE structure
3. Do the position='G' part in a sub-query and join to that.
The results of those should show whether this is actually the problem.
--
Richard Huxton
Archonet Ltd
.
--
_____________________________________________
G l o b a l D i a g n o s t i c s I n c.
Andrew Schmidt t.416-304-0049 x206
aschmidt@xxxxxxxxxxxxx f.866-697-8726
_____________________________________________
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend