On 12/10/21 5:27 PM, Wells Oliver
wrote:
PG 13.4. Can't quite run the EXPLAIN ANALYZE since it takes so long, but EXPLAIN SELECT DISTINCT game_id FROM vw_joints shows (s = joints, p = plays)
----------------------------------------------------------------------------------------------------
HashAggregate (cost=63810150.11..63810168.40 rows=1829 width=4)
Group Key: p.game_id
-> Hash Left Join (cost=21647.78..60977838.19 rows=1132924766 width=4)
Hash Cond: (s.play_uuid = p.play_uuid)
-> Append (cost=0.00..57982241.49 rows=1132924766 width=16)
-> Seq Scan on joints_2021_01 s_1 (cost=0.00..13.00 rows=300 width=16)
-> Seq Scan on joints_2021_02 s_2 (cost=0.00..13.00 rows=300 width=16)
-> Seq Scan on joints_2021_03 s_3 (cost=0.00..13.00 rows=300 width=16)
-> Seq Scan on joints_2021_04 s_4 (cost=0.00..2217902.16 rows=49859816 width=16)
-> Seq Scan on joints_2021_05 s_5 (cost=0.00..2965019.35 rows=63440735 width=16)
-> Seq Scan on joints_2021_06 s_6 (cost=0.00..3797848.89 rows=85688889 width=16)
-> Seq Scan on joints_2021_07 s_7 (cost=0.00..5867829.24 rows=115975424 width=16)
-> Seq Scan on joints_2021_08 s_8 (cost=0.00..17274328.41 rows=380175741 width=16)
-> Seq Scan on joints_2021_09 s_9 (cost=0.00..18226427.88 rows=393209088 width=16)
-> Seq Scan on joints_2021_10 s_10 (cost=0.00..1942824.01 rows=44003201 width=16)
-> Seq Scan on joints_2021_11 s_11 (cost=0.00..25385.72 rows=570672 width=16)
-> Seq Scan on joints_2021_12 s_12 (cost=0.00..13.00 rows=300 width=16)
-> Hash (cost=14292.90..14292.90 rows=588390 width=20)
-> Seq Scan on plays p (cost=0.00..14292.90 rows=588390 width=20)
JIT:
Functions: 34
Options: Inlining true, Optimization true, Expressions true, Deforming true
(22 rows)
On Fri, Dec 10, 2021 at 3:17 PM Ron <ronljohnsonjr@xxxxxxxxx> wrote:
On 12/10/21 4:53 PM, Wells Oliver wrote:
> This feels very 101 but I feel like it should be much faster:
>
> A table "joints" with a PK of play_uuid, target_id, joint_seq,
> joint_timestamp.
>
> "joints" is partitioned using RANGE on joint_timestamp for monthly
> partitions 1/1 - 2/1, 2-1 - 3/1, etc.
>
> "joints" has an FK where play_uuid refers to table "plays" and
> column "play_uuid" where "play_uuid" is the PK.
>
> "plays" additionally has an indexed column game_id.
>
> "joints" has 1133,932,391 rows across 12 monthly partitions for 2021, and
> "plays has 585,627 rows. We made a view called "vw_joints" which just does:
>
> SELECT * FROM joints JOIN plays USING (play_id);
>
> Then doing:
>
> SELECT DISTINCT game_id FROM vw_joints
>
> Takes 35-45 minutes. Which seems nuts. We do this kind of design in a few
> different plays to normalize things, but it comes at the cost of these
> agonizingly slow (and seemingly dead simple) qeuries.
>
> Is there any optimization to do here beyond flattening table and
> de-normalizing data? Is the partitioning causing a slowness here? I feel
> like partitioning is creating some difficulty...
What Postgresql version?
What does the query plan look like?
--
Angular momentum makes the world go 'round.
--
Wells Oliver
wells.oliver@xxxxxxxxx
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.