Search Postgresql Archives

Re: Partitioning and unique key

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 1 Sept 2024 at 03:58, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:

The model is at odds with itself and untenable. If the tables hold
multiple rows for a given transaction_id  then you cannot have a
PK/Unique constraint on that column. Seems there is a decided lack of
any planning. The only way I can see this happening is consolidating all
the duplicate transaction_id rows into a single row for each
transaction_id. That then leads to the question of how to do that and
retain the 'correct' information from the selection of rows for each
transaction_id.


Yes we had messed up the data for now and have multiple records for each transaction_id persisted and thus we need to fix the data. But more than that , as I stated , I wanted to understand first 
1) if it's technically possible to have a unique key on only the transaction_id column having the partition key on the transaction_timestamp, because the table is going to be queried/purged based on the transaction_timestamp?

2) Additionally we were thinking if above is technically not possible, then the maximum granularity which we can have for each transaction_id will be a day, so the partition key transaction_timestmp can be truncated to have only date component but no time component. So the primary key will be (transaction_id, transaction_date). But we also don't want to lose the time component and persist the existing data of transaction_timestmp (which will have a time component in it, in a separate column).

And in above case , for fixing the existing data in least disruptive way, as we have currently duplicate transaction_id inserted into the table already because of the composite primary key(transaction_id, transaction_timestmp).Can we simply
 i)rename the existing column transaction_timestmp to transaction_date and then add new column transaction_timestmp using the values of existing column partition by partition. 
ii)And then delete the duplicate data using query something as below , each partition by partition. 
iii)And then alter the datatype of the partition key transaction_date to DATE in one shot at the table level(which should be fast as its having more granularity as compare to existing timestamptype, so should be catalog or dictionary change only), and that will remain the part of composite PK (transaction_id,transaction_date).
iv) Repeat this step for all child partition tables and then for the parent partition tables. 

Will this technique be the most efficient way of fixing this mess? 

WITH ranked_records AS (
            SELECT column1_id, column2_timestamptz,
                   ROW_NUMBER() OVER (PARTITION BY column1_id, date_trunc('day', column2_timestamptz)
                                      ORDER BY column2_timestamptz DESC) AS rn
            FROM partition_name
        )
        DELETE FROM partition_name T1
        WHERE EXISTS (
            SELECT 1
            FROM ranked_records T2
            WHERE T1.column1_id = T2.column1_id
              AND T1.column2_timestamptz = T2.column2_timestamptz
              AND T2.rn > 1
        )


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux