Re: Poor performance due to parallel seq scan on indexed date field

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's just this.

CREATE OR REPLACE VIEW vw_pitches AS
SELECT
    p.year,
    p.game_id,
    p.game_date,
    p.game_level,
    ...
from synergy.pitches as p
join alias.identity as idpitcher
    on p.pitcher_identity_id = idpitcher.identity_id
left join alias.identity as idcatcher
    on p.catcher_identity_id = idcatcher.identity_id
left join alias.identity as idbatter
    on p.batter_identity_id = idbatter.identity_id;

The alias.identity.identity_id column is indexed.

The main issue is SELECT COUNT(*) over wide date ranges, which is something we'd like to do frequently.


On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger@xxxxxxxxxx> wrote:

Am 21.06.23 um 20:31 schrieb Wells Oliver:
> Dead simple date scan across a big-ish table (est. 23,153,666 rows)
>
> explain analyze select count(*) from vw_pitches where game_date >=
> '2022-06-21' and game_date <= '2023-06-21';
>
> The view does do some joins but those don't seem to be the issue to me.
>
> Planner does:
>
> Finalize Aggregate  (cost=3596993.88..3596993.89 rows=1 width=8)
> (actual time=69980.491..69982.076 rows=1 loops=1)
>   ->  Gather  (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> time=69979.137..69982.071 rows=5 loops=1)
>         Workers Planned: 4
>         Workers Launched: 4
>         ->  Partial Aggregate  (cost=3595993.46..3595993.47 rows=1
> width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
>               ->  Nested Loop  (cost=0.44..3591408.37 rows=1834034
> width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
>                     ->  Parallel Seq Scan on pitches p
>  (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> time=0.852..68914.256 rows=1458419 loops=5)
>                           Filter: ((game_date >= '2022-06-21'::date)
> AND (game_date <= '2023-06-21'::date))
>                           Rows Removed by Filter: 3212310
>                     ->  Memoize  (cost=0.44..0.47 rows=1 width=4)
> (actual time=0.000..0.000 rows=1 loops=7292095)
>                           Cache Key: p.pitcher_identity_id
>                           Cache Mode: logical
>                           Hits: 1438004  Misses: 21042  Evictions: 0
>  Overflows: 0  Memory Usage: 2138kB
>                           Worker 0:  Hits: 1429638  Misses: 21010
>  Evictions: 0  Overflows: 0  Memory Usage: 2134kB
>                           Worker 1:  Hits: 1456755  Misses: 21435
>  Evictions: 0  Overflows: 0  Memory Usage: 2177kB
>                           Worker 2:  Hits: 1433557  Misses: 21201
>  Evictions: 0  Overflows: 0  Memory Usage: 2154kB
>                           Worker 3:  Hits: 1428727  Misses: 20726
>  Evictions: 0  Overflows: 0  Memory Usage: 2105kB
>                           ->  Index Only Scan using identity_pkey on
> identity idpitcher  (cost=0.43..0.46 rows=1 width=4) (actual
> time=0.007..0.007 rows=1 loops=105414)
>                                 Index Cond: (identity_id =
> p.pitcher_identity_id)
>                                 Heap Fetches: 83
> Planning Time: 1.407 ms
> Execution Time: 69982.927 ms
>
> Is there something to be done here? Kind of a frequent style of query
> and quite slow.
>
Could you provide the definition of the view(s) down to the base tables?

--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012



--
Wells Oliver
wells.oliver@xxxxxxxxx

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux