2009/8/2 Adam PAPAI <wooh@xxxxxxx>: > Hello, > > > I have a problem with an inner join + count(). > > my query is: > > explain analyze select > k.idn,k.kerdes_subject,k.kerdes_text,u.vezeteknev,u.keresztnev,u.idn as > user_id, kg.kategoria_neve, count(v.idn) > > FROM kategoriak as kg > > INNER JOIN kerdesek as k on kg.idn = k.kategoria_id > INNER JOIN users as u ON k.user_id = u.idn > INNER JOIN valaszok as v ON k.idn = v.kerdes_id > > where kg.idn=15 group by k.idn, k.kerdes_subject,k.kerdes_text, > u.idn,u.vezeteknev,u.keresztnev,kg.kategoria_neve > > The problem is with the count(v.idn). > > This column has a relation with: v.kerdes_id = k.idn => k.kategoria_id = > kg.idn > > and the WHERE says: kg.idn = 15. > > Why does it run through all lines in v? > > the explain sais: > > GroupAggregate (cost=103238.59..103602.66 rows=10402 width=1382) (actual > time=8531.405..8536.633 rows=73 loops=1) > -> Sort (cost=103238.59..103264.59 rows=10402 width=1382) (actual > time=8531.339..8533.199 rows=1203 loops=1) > Sort Key: k.idn, k.kerdes_subject, k.kerdes_text, u.idn, > u.vezeteknev, u.keresztnev, kg.kategoria_neve > -> Hash Join (cost=3827.79..89951.54 rows=10402 width=1382) > (actual time=1778.590..8523.015 rows=1203 loops=1) > Hash Cond: (v.kerdes_id = k.idn) > -> Seq Scan on valaszok v (cost=0.00..78215.98 rows=2080998 > width=8) (actual time=59.714..5009.171 rows=2080998 loops=1) > -> Hash (cost=3823.42..3823.42 rows=350 width=1378) (actual > time=12.553..12.553 rows=74 loops=1) > -> Nested Loop (cost=14.98..3823.42 rows=350 > width=1378) (actual time=0.714..12.253 rows=74 loops=1) > -> Nested Loop (cost=14.98..1056.38 rows=350 > width=830) (actual time=0.498..5.952 rows=117 loops=1) > -> Seq Scan on kategoriak kg > (cost=0.00..1.30 rows=1 width=278) (actual time=0.066..0.076 rows=1 loops=1) > Filter: (idn = 15) > -> Bitmap Heap Scan on kerdesek k > (cost=14.98..1051.58 rows=350 width=560) (actual time=0.374..5.430 rows=117 > loops=1) > Recheck Cond: (15 = kategoria_id) > -> Bitmap Index Scan on > kategoria_id_id_idx (cost=0.00..14.89 rows=350 width=0) (actual > time=0.212..0.212 rows=117 loops=1) > Index Cond: (15 = kategoria_id) > -> Index Scan using users_pkey on users u > (cost=0.00..7.89 rows=1 width=552) (actual time=0.047..0.048 rows=1 > loops=117) > Index Cond: (k.user_id = u.idn) > Total runtime: 8536.936 ms > > > > So it run through more than 2 mill lines... but why? It should only count > those lines which has the category_id = 15... > > What am I doing wrong? Well, I'm not sure if you're doing anything wrong, but you're definitely thinking about it wrong. There's no way to skip the lines in v that have kg.idn != 15 just by looking at v, because the idn column is in kg, not in v. Obviously you have to look through kg first and find the lines where kg.idn = 15. Or since kg.idn = k.kategoria_id, you can alternatively start by scanning k for kategoria_id = 15, which is what the planner chose to do here. Once you know which lines from k you need, then you can go through v and look for lines that have a match in k based on the join condition k.idn = v.kerdes_id. Do you have an index on valaszok (kerdes_id)? Might be worth investigating. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance