Re: select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?

Robert Haas <robertmhaas@xxxxxxxxx> · Tue, 4 Aug 2009 09:46:33 -0400

2009/8/2 Adam PAPAI <wooh@xxxxxxx>:
> Hello,
>
>
> I have a problem with an inner join + count().
>
> my query is:
>
> explain analyze select
> k.idn,k.kerdes_subject,k.kerdes_text,u.vezeteknev,u.keresztnev,u.idn as
> user_id, kg.kategoria_neve, count(v.idn)
>
> FROM kategoriak as kg
>
> INNER JOIN kerdesek as k on kg.idn = k.kategoria_id
> INNER JOIN users as u ON k.user_id = u.idn
> INNER JOIN valaszok as v ON k.idn = v.kerdes_id
>
> where kg.idn=15 group by k.idn, k.kerdes_subject,k.kerdes_text,
> u.idn,u.vezeteknev,u.keresztnev,kg.kategoria_neve
>
> The problem is with the count(v.idn).
>
> This column has a relation with: v.kerdes_id = k.idn => k.kategoria_id =
>  kg.idn
>
> and the WHERE says: kg.idn = 15.
>
> Why does it run through all lines in v?
>
> the explain sais:
>
>  GroupAggregate  (cost=103238.59..103602.66 rows=10402 width=1382) (actual
> time=8531.405..8536.633 rows=73 loops=1)
>   ->  Sort  (cost=103238.59..103264.59 rows=10402 width=1382) (actual
> time=8531.339..8533.199 rows=1203 loops=1)
>         Sort Key: k.idn, k.kerdes_subject, k.kerdes_text, u.idn,
> u.vezeteknev, u.keresztnev, kg.kategoria_neve
>         ->  Hash Join  (cost=3827.79..89951.54 rows=10402 width=1382)
> (actual time=1778.590..8523.015 rows=1203 loops=1)
>               Hash Cond: (v.kerdes_id = k.idn)
>               ->  Seq Scan on valaszok v  (cost=0.00..78215.98 rows=2080998
> width=8) (actual time=59.714..5009.171 rows=2080998 loops=1)
>               ->  Hash  (cost=3823.42..3823.42 rows=350 width=1378) (actual
> time=12.553..12.553 rows=74 loops=1)
>                     ->  Nested Loop  (cost=14.98..3823.42 rows=350
> width=1378) (actual time=0.714..12.253 rows=74 loops=1)
>                           ->  Nested Loop  (cost=14.98..1056.38 rows=350
> width=830) (actual time=0.498..5.952 rows=117 loops=1)
>                                 ->  Seq Scan on kategoriak kg
> (cost=0.00..1.30 rows=1 width=278) (actual time=0.066..0.076 rows=1 loops=1)
>                                       Filter: (idn = 15)
>                                 ->  Bitmap Heap Scan on kerdesek k
> (cost=14.98..1051.58 rows=350 width=560) (actual time=0.374..5.430 rows=117
> loops=1)
>                                       Recheck Cond: (15 = kategoria_id)
>                                       ->  Bitmap Index Scan on
> kategoria_id_id_idx  (cost=0.00..14.89 rows=350 width=0) (actual
> time=0.212..0.212 rows=117 loops=1)
>                                             Index Cond: (15 = kategoria_id)
>                           ->  Index Scan using users_pkey on users u
> (cost=0.00..7.89 rows=1 width=552) (actual time=0.047..0.048 rows=1
> loops=117)
>                                 Index Cond: (k.user_id = u.idn)
>  Total runtime: 8536.936 ms
>
>
>
> So it run through more than 2 mill lines... but why? It should only count
> those lines which has the category_id = 15...
>
> What am I doing wrong?

Well, I'm not sure if you're doing anything wrong, but you're
definitely thinking about it wrong.  There's no way to skip the lines
in v that have kg.idn != 15 just by looking at v, because the idn
column is in kg, not in v.  Obviously you have to look through kg
first and find the lines where kg.idn = 15.  Or since kg.idn =
k.kategoria_id, you can alternatively start by scanning k for
kategoria_id = 15, which is what the planner chose to do here.  Once
you know which lines from k you need, then you can go through v and
look for lines that have a match in k based on the join condition
k.idn = v.kerdes_id.

Do you have an index on valaszok (kerdes_id)?  Might be worth investigating.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance