Composite Foreign Key performance

Leonard Boyce <leonard.boyce@xxxxxxxxxxxxxx> · Wed, 22 Jan 2014 13:02:56 -0500

Apologies for the previous partial post earlier, fat-fingered today.
If this is a repost I apologize in advance as I could not see my reply
in the archive after a couple of hours.

I currently have;
 - lookup table (lookup_id BYTEA PK, status TEXT, last_update
TSTAMPTZ, ...) (PK index is > 20GB)
 - data table (data_id BIGINT PK, customer_id BIGINT, lookup_id BYTEA,
...) which is partitioned by customer_id (there is currently no FK set
up for lookup_id)

> A bytea as the PK? That sounds horrible.

Reason for the bytea: benching it gave far better results as far as
size went (index and on-disk size). Every query is keyed on this value
(decoded from external data)

The lookup table currently contains ~ 350mm records.
We receive ~ 10mm new customer 'data' records per day, and there is ~
8% likelyhood of overlap with other customer records.

For every set of data imported, we export those records with extra
data appended, including the 'status' and other fields.

Right now we perform join btw data and lookup when extracting the
records, which is quite an expensive query.

What I'm thinking of doing is extending the data table to include
those fields from lookup which are always exported and creating a
composite foreign key so that any updates to the lookup record are
propagated to any matching records in the various data_X partitions.

This would involve converting PK on lookup to (lookup_id, status, last_update)
adding fields (status, last_update) to the data parent table
syncing/copying the relevant data from the lookup table into matching
data_X tables
creating new index (lookup_id, status, last_update) on each data_X child table;
creating new FK (lookup_id, status, last_update) on each data_X child
table with ON UPDATE CASCADE ON DELETE RESTRICT;

I'm aware of the extra space this would require, and am really wondering if:
a) there is a potential improvement in the select from data_X
operations when not using the join
b) there is a significant overhead in having the foreign keys

Looking for any initial advice from experience before I set up huge
test to measure.

Thanks,
Leonard

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general