It's no guarantee* that will speed up the query, but "index on WHERE clause fields" is the first thing to look for.
* The problem with "over wide date ranges" is that the query optimizer might decide that the date range is so wide that it's just as cheap -- or cheaper -- to scan the whole table.
On 6/21/23 13:39, Wells Oliver wrote:
It's just this.
CREATE OR REPLACE VIEW vw_pitches AS
SELECT
p.year,
p.game_id,
p.game_date,
p.game_level,
...
from synergy.pitches as p
join alias.identity as idpitcher
on p.pitcher_identity_id = idpitcher.identity_id
left join alias.identity as idcatcher
on p.catcher_identity_id = idcatcher.identity_id
left join alias.identity as idbatter
on p.batter_identity_id = idbatter.identity_id;
The alias.identity.identity_id column is indexed.
The main issue is SELECT COUNT(*) over wide date ranges, which is something we'd like to do frequently.
On Wed, Jun 21, 2023 at 11:37 AM Holger Jakobs <holger@xxxxxxxxxx> wrote:
Am 21.06.23 um 20:31 schrieb Wells Oliver:
> Dead simple date scan across a big-ish table (est. 23,153,666 rows)
>
> explain analyze select count(*) from vw_pitches where game_date >=
> '2022-06-21' and game_date <= '2023-06-21';
>
> The view does do some joins but those don't seem to be the issue to me.
>
> Planner does:
>
> Finalize Aggregate (cost=3596993.88..3596993.89 rows=1 width=8)
> (actual time=69980.491..69982.076 rows=1 loops=1)
> -> Gather (cost=3596993.46..3596993.87 rows=4 width=8) (actual
> time=69979.137..69982.071 rows=5 loops=1)
> Workers Planned: 4
> Workers Launched: 4
> -> Partial Aggregate (cost=3595993.46..3595993.47 rows=1
> width=8) (actual time=69975.136..69975.137 rows=1 loops=5)
> -> Nested Loop (cost=0.44..3591408.37 rows=1834034
> width=0) (actual time=0.882..69875.934 rows=1458419 loops=5)
> -> Parallel Seq Scan on pitches p
> (cost=0.00..3537431.89 rows=1834217 width=12) (actual
> time=0.852..68914.256 rows=1458419 loops=5)
> Filter: ((game_date >= '2022-06-21'::date)
> AND (game_date <= '2023-06-21'::date))
> Rows Removed by Filter: 3212310
> -> Memoize (cost=0.44..0.47 rows=1 width=4)
> (actual time=0.000..0.000 rows=1 loops=7292095)
> Cache Key: p.pitcher_identity_id
> Cache Mode: logical
> Hits: 1438004 Misses: 21042 Evictions: 0
> Overflows: 0 Memory Usage: 2138kB
> Worker 0: Hits: 1429638 Misses: 21010
> Evictions: 0 Overflows: 0 Memory Usage: 2134kB
> Worker 1: Hits: 1456755 Misses: 21435
> Evictions: 0 Overflows: 0 Memory Usage: 2177kB
> Worker 2: Hits: 1433557 Misses: 21201
> Evictions: 0 Overflows: 0 Memory Usage: 2154kB
> Worker 3: Hits: 1428727 Misses: 20726
> Evictions: 0 Overflows: 0 Memory Usage: 2105kB
> -> Index Only Scan using identity_pkey on
> identity idpitcher (cost=0.43..0.46 rows=1 width=4) (actual
> time=0.007..0.007 rows=1 loops=105414)
> Index Cond: (identity_id =
> p.pitcher_identity_id)
> Heap Fetches: 83
> Planning Time: 1.407 ms
> Execution Time: 69982.927 ms
>
> Is there something to be done here? Kind of a frequent style of query
> and quite slow.
>
Could you provide the definition of the view(s) down to the base tables?
--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
--
Wells Oliver
wells.oliver@xxxxxxxxx
--
Born in Arizona, moved to Babylonia.
Born in Arizona, moved to Babylonia.