Hi
I ran into some performance problems regarding foreign keys lately.
My schema has about 20 tables, which each contain from 10 to 100.000 records. They have quite complicated interdependencies, modeled using
foregin keys set to "on update cascade, on delete cascade".
The schema stores data for multiple customers - Recently I wanted
to extract the data for just a single customer. I duplicated the schema,
and deleted all but one customer from the "customer" table. This worked
as expected, but the delete took a few hours (!) on a moderatly fast machine (dual 1GHz PIII, RAID5-Array for postgres-data).
As far as I understand the code, foreign keys are implemented using triggers, which in case of "on delete cascade" delete from the "slave"-table when a record from the "master"-table is deleted. I guess
that the bad performance is due to running this trigger about 100.1000 times when deleting a lot of rows from a large table - and each time it has to find referencing tuples by doing an index scan - so the performance is about as bad as it would be if "delete * from table" would use an index scan.
Since postgres already incoporates code to check foreign keys more efficiently (when doing alter table ... add constraint .. foreign key, postgres seems to use a merge or a hash join, instead of a nested loop),
I wondered how hard it would be to use this for the triggers too.
I imagined creating a statement-level trigger in parallel to the row-level triggers, and defining some threshold (let's say, more than 10% of the rows deleted). If the threshold is reached, the row-level trigger would just do nothing, and the statement-level trigger would delete the referencing records doing a join.
Would this be feasable? And would it be something a newbie could tackle, or is it more involved than I think?
greetings, Florian Pflug
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature