Re: Poor performance due to parallel seq scan on indexed date field

Ron <ronljohnsonjr@xxxxxxxxx> · Wed, 21 Jun 2023 14:14:46 -0500



    Is there an index on synergy.pitches.game_date?

    
    It's no guarantee* that will speed up the query, but "index on WHERE
    clause fields" is the first thing to look for.

    
    * The problem with "over wide date ranges" is that the query
    optimizer might decide that the date range is so wide that it's just
    as cheap -- or cheaper -- to scan the whole table.

    
    On 6/21/23 13:39, Wells Oliver wrote:

    
      It's just this.
        

        CREATE OR REPLACE VIEW vw_pitches AS 

          SELECT

              p.year,

              p.game_id,

              p.game_date,

              p.game_level,

              ...

          from synergy.pitches as p

          join alias.identity as idpitcher

              on p.pitcher_identity_id = idpitcher.identity_id

          left join alias.identity as idcatcher

              on p.catcher_identity_id = idcatcher.identity_id

          left join alias.identity as idbatter

              on p.batter_identity_id = idbatter.identity_id;

        
        The alias.identity.identity_id column is indexed.
        

        The main issue is SELECT COUNT(*) over wide date ranges,
          which is something we'd like to do frequently.
        

        On Wed, Jun 21, 2023 at
          11:37 AM Holger Jakobs <holger@xxxxxxxxxx>
          wrote:

        
          Am 21.06.23 um 20:31 schrieb Wells Oliver:

          > Dead simple date scan across a big-ish table
          (est. 23,153,666 rows)

          >

          > explain analyze select count(*) from vw_pitches where
          game_date >= 

          > '2022-06-21' and game_date <= '2023-06-21';

          >

          > The view does do some joins but those don't seem to be
          the issue to me.

          >

          > Planner does:

          >

          > Finalize Aggregate  (cost=3596993.88..3596993.89 rows=1
          width=8) 

          > (actual time=69980.491..69982.076 rows=1 loops=1)

          >   ->  Gather  (cost=3596993.46..3596993.87 rows=4
          width=8) (actual 

          > time=69979.137..69982.071 rows=5 loops=1)

          >         Workers Planned: 4

          >         Workers Launched: 4

          >         ->  Partial Aggregate
           (cost=3595993.46..3595993.47 rows=1 

          > width=8) (actual time=69975.136..69975.137 rows=1
          loops=5)

          >               ->  Nested Loop  (cost=0.44..3591408.37
          rows=1834034 

          > width=0) (actual time=0.882..69875.934 rows=1458419
          loops=5)

          >                     ->  Parallel Seq Scan on pitches p
          

          >  (cost=0.00..3537431.89 rows=1834217 width=12) (actual 

          > time=0.852..68914.256 rows=1458419 loops=5)

          >                           Filter: ((game_date >=
          '2022-06-21'::date) 

          > AND (game_date <= '2023-06-21'::date))

          >                           Rows Removed by Filter: 3212310

          >                     ->  Memoize  (cost=0.44..0.47
          rows=1 width=4) 

          > (actual time=0.000..0.000 rows=1 loops=7292095)

          >                           Cache Key:
          p.pitcher_identity_id

          >                           Cache Mode: logical

          >                           Hits: 1438004  Misses: 21042
           Evictions: 0 

          >  Overflows: 0  Memory Usage: 2138kB

          >                           Worker 0:  Hits: 1429638
           Misses: 21010 

          >  Evictions: 0  Overflows: 0  Memory Usage: 2134kB

          >                           Worker 1:  Hits: 1456755
           Misses: 21435 

          >  Evictions: 0  Overflows: 0  Memory Usage: 2177kB

          >                           Worker 2:  Hits: 1433557
           Misses: 21201 

          >  Evictions: 0  Overflows: 0  Memory Usage: 2154kB

          >                           Worker 3:  Hits: 1428727
           Misses: 20726 

          >  Evictions: 0  Overflows: 0  Memory Usage: 2105kB

          >                           ->  Index Only Scan using
          identity_pkey on 

          > identity idpitcher  (cost=0.43..0.46 rows=1 width=4)
          (actual 

          > time=0.007..0.007 rows=1 loops=105414)

          >                                 Index Cond: (identity_id
          = 

          > p.pitcher_identity_id)

          >                                 Heap Fetches: 83

          > Planning Time: 1.407 ms

          > Execution Time: 69982.927 ms

          >

          > Is there something to be done here? Kind of a frequent
          style of query 

          > and quite slow.

          >

          Could you provide the definition of the view(s) down to the
          base tables?

          
          -- 

          Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

          
      -- 

      
          Wells Oliver

            wells.oliver@xxxxxxxxx
        
      
    -- 

      Born in Arizona, moved to Babylonia.