Presumably I could partition proposal on has_been_anonymised, however
the row counts seem low enough that it feels a bit like overkill? We
also need referential integrity so I'll need to wait until that's in
(I think it's coming in PG12?)
If I decrease the number of legacy_organisation_id’s that are being
used then the query performance gets much better, but presumably
that’s because there’s a smaller dataset.
What are the actual counts that your queries are returning?
For your first query at least, are you sure your issue is not simply that you have no index on proposal.proposal.reference? Because the entry_time filter is highly selective (and that part of the query only took 180ms), I would think the planner would first filter on the note table, then join back to proposal.proposal using an index scan on reference. But you have no index there. You might even consider an index on (reference) WHERE has_been_anonymised = false?
Also, one of your challenges seems to be that all of your indexed fields are low cardinality. Rather than partitioning on has_been_anonymised, perhaps you could consider partitioning on system_id and sub-partition on legacy_organisation_id? It depends on if your attached queries are always the standard pattern or not though. This is something you might play around with.
Another option is to try yet further specificity in your partial index conditions, and also to only then index your primary key. For example:
CREATE INDEX ON proposal.proposal (id)
WHERE has_been_anonymised = false AND system_id = 11;
I'm curious if any of these ideas would make a difference.
Thanks,
Jeremy
Jeremy