On 1/12/24 6:34 AM, Jeremiah Bauer wrote:
That'd be a band-aid at best, because we know that the query used to
define the materialized view runs in a reasonable amount of time on it's
own, as does a CTAS. So either the REFRESH is doing something odd when
writing into the new relation (which looking at the code seems very
unlikely), or REFRESH is getting a different query plan for some reason.
Unfortunately, I don't know of any easy way to get the query plan for
the REFRESH (it might be possible via gdb, but I'm not sure). We do at
least know that the REFRESH is using parallel workers.
Can you post the output of EXPLAIN ANALYZE for the SELECT? That might
provide some clues.
Sure, here is the explain analyze for the select:
HashAggregate (cost=123986470.20..129486707.63 rows=164493082 width=24)
(actual time=697250.385..741548.965 rows=59015171 loops=1)
Group Key: id1, id2, id3
Planned Partitions: 64 Batches: 65 Memory Usage: 328209kB Disk
Usage: 6750176kB
-> Gather (cost=64653301.50..107228737.47 rows=328986164 width=24)
(actual time=357598.331..594226.355 rows=161151623 loops=1)
Workers Planned: 2
Workers Launched: 2
-> HashAggregate (cost=64652301.50..74329121.07
rows=164493082 width=24) (actual time=357572.082..578038.457
rows=53717208 loops=3)
Group Key: id1, id2, id3
Planned Partitions: 64 Batches: 65 Memory Usage:
328209kB Disk Usage: 25774088kB
Worker 0: Batches: 65 Memory Usage: 328209kB Disk
Usage: 25375784kB
Worker 1: Batches: 65 Memory Usage: 328209kB Disk
Usage: 25382936kB
-> Parallel Seq Scan on large_table
(cost=0.00..29740358.40 rows=685387840 width=24) (actual
time=12.954..99596.289 rows=548310252 loops=3)
Planning Time: 5.380 ms
Execution Time: 745750.371 ms
Ok, so that's using a parallel query as well, just like REFRESH, so no
help there.
At this point I think you'd need to do some OS-level investigation using
a tool like perf to determine what the parent process is doing once the
workers finish.
BTW, I did just come across a script[1] that will print the query plan
for a running query. Based on what you've said I suspect it wouldn't
tell us much here, but I wanted to mention it.
1: https://github.com/StarfishStorage/explain-running-query
--
Jim Nasby, Data Architect, Austin TX