On 8/31/24 13:02, veem v wrote:
Hello,
We have our transaction tables daily range partitioned based on
transaction_timestamp column which is timestamptz data type and these
are having composite primary key on (transaction_id,
transaction_timestamp). And we were using an "insert on conflict" for
loading data to our system , which means if another record comes to the
system with the same transaction_id and transaction_timestamp, it will
get updated. This way we already have 60 days worth of data stored in
our system with approx. 70 million transactions per day.
But we just got to know from business thatthe data should be unique by
only transaction_id but not transaction_timestamp. Any incoming data
with the same transaction_id(even different transaction_timestamp)
should get updated but not inserted.
Also these daily partitions are going to hold 400million rows in future
and will be queried on the transaction_timestamp filter so we can't
really avoid the partitioning option here considering future growth.
But due to postgres limitations we are unable to have this unique
constraint or primary key only on the transaction_id column, we have to
include transaction_timestamp with it as a composite key. So I want to
understand from experts if there is any possible way to satisfy both
partitioning on transaction_timestamp column and unique key or pk just
on _trans_action_id only?
The model is at odds with itself and untenable. If the tables hold
multiple rows for a given transaction_id then you cannot have a
PK/Unique constraint on that column. Seems there is a decided lack of
any planning. The only way I can see this happening is consolidating all
the duplicate transaction_id rows into a single row for each
transaction_id. That then leads to the question of how to do that and
retain the 'correct' information from the selection of rows for each
transaction_id.
Note-its 15.4 postgres database.
Regards
Veem
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx