Re: Insert query performance

David Rowley <dgrowleyml@xxxxxxxxx> · Tue, 20 Aug 2024 23:43:13 +1200

On Tue, 20 Aug 2024 at 19:09, sud <suds1434@xxxxxxxxx> wrote:
> However, my initial understanding of "having the FK index will improve the insert performance in the child table" is not accurate it seems. Rather as you mentioned it may negatively impact the loading/insert performance because it has to now update the additional index in each insert. In case of insert into child table, to ensure if the child row is already present in the parent ,  it just scans the parent by the Primary key of the parent table (which is be default indexed) and thus it doesn't need an index in the child table foreign keys or having an index in the foreign key in the child table won't help the constraint validation faster. Please correct me if my understanding is wrong here.

If you think about what must happen when you insert into the
referencing table, the additional validation that the foreign key must
do is check that a corresponding record exists in the referenced
table. An index on the referencing table does not help speed that up.

> Additionally as you mentioned "explain analyze" will show a section on how much time it really takes for the constraint validation , I can see that section now. But it seems it will really need that INSERT statement to be executed and that we can't really do in production as that will physically insert data into the table. So do you mean to just do the "explain analyze" for the INSERT query and capture the plan and then do the rollback?  And in our case it's a row by row insert happening , so we will see if we can club/sum that "constraint validation" time for a handful if insert somehow to get a better idea on the percentage of time we really spent in the constraint validation.

I'd recommend performing a schema-only dump of your production
database and experimenting well away from production. See pg_dump
--schema-only.

I also recommend not leaving performance to chance and testing the
impact of index vs no index away from production with some data loaded
that is representative of your production data (or use the production
data if it's available and small enough to manage).  Use pgbench to
see what impact having the index on the referencing table has on
performance on inserts into that table vs what improvements you gain
from having the index when there's cascading delete from the
referenced table.

You might also want to look into auto_explain [1].  You can load this
into a single session and set auto_explain.log_min_duration = 0,
auto_explain.log_analyze = on and auto_explain.log_nested_statements =
on. That should give you the plan for the cascade DELETE query that's
executed by the trigger when you perform the DELETE on the referenced
table. (Also see the note about auto_explain.log_timing)

David

[1] https://www.postgresql.org/docs/15/auto-explain.html