Hi
Using view which does a join in two tables to access things in one of them is not a good idea. You pay for a join even though it is not necessary. So if you just want distinct game_id and you don’t care if its play_uuid is in joints, then you can accelerate using the table plays directly.
Michel SALAIS
De : Wells Oliver <wells.oliver@xxxxxxxxx>
Envoyé : vendredi 10 décembre 2021 23:53
À : pgsql-admin <pgsql-admin@xxxxxxxxxxxxxx>
Objet : Very slow 101-feeling design/query..
This feels very 101 but I feel like it should be much faster:
A table "joints" with a PK of play_uuid, target_id, joint_seq, joint_timestamp.
"joints" is partitioned using RANGE on joint_timestamp for monthly partitions 1/1 - 2/1, 2-1 - 3/1, etc.
"joints" has an FK where play_uuid refers to table "plays" and column "play_uuid" where "play_uuid" is the PK.
"plays" additionally has an indexed column game_id.
"joints" has 1133,932,391 rows across 12 monthly partitions for 2021, and "plays has 585,627 rows. We made a view called "vw_joints" which just does:
SELECT * FROM joints JOIN plays USING (play_id);
Then doing:
SELECT DISTINCT game_id FROM vw_joints
Takes 35-45 minutes. Which seems nuts. We do this kind of design in a few different plays to normalize things, but it comes at the cost of these agonizingly slow (and seemingly dead simple) qeuries.
Is there any optimization to do here beyond flattening table and de-normalizing data? Is the partitioning causing a slowness here? I feel like partitioning is creating some difficulty...
wells.oliver@xxxxxxxxx