On Thu, Feb 8, 2024 at 12:42 AM sud <suds1434@xxxxxxxxx> wrote:...The key transaction table is going to have ~450 Million transactions per day and the data querying/filtering will always happen based on the "transaction date" column....Should we go for simple daily range partitioning on the transaction_date column?This one gets my vote. That and some good indexes.Cheers,Greg
Hello Greg,
Out of curiosity, As OP mentioned that there will be Joins and also filters on column Customer_id column , so why don't you think that subpartition by customer_id will be a good option? I understand List subpartition may not be an option considering the new customer_ids gets added slowly in the future(and default list may not be allowed) and also OP mentioned, there is skewed distribution of data for customer_id column. However what is the problem if OP will opt for HASH subpartition on customer_id in this situation?
Is it because the number of partitions will be higher i.e.
If you go with simple range partitioning, for 5 months you will have ~150 daily range partitions and with each index the count of partition will gets double, for e.g if you will have 10 indexes, the total partitions will be = ~150 table partition+ (10*150)index partition= 1650 total number of partitions.
If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 8, hash sub-partitions , then the total number of partitions will be = (8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions.
Though there are no theoretical limits to the number of partitions in postgres, there are some serious issues noted in the past with higher number of table partitions. One such is below. Is this the reason?
https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits
Out of curiosity, As OP mentioned that there will be Joins and also filters on column Customer_id column , so why don't you think that subpartition by customer_id will be a good option? I understand List subpartition may not be an option considering the new customer_ids gets added slowly in the future(and default list may not be allowed) and also OP mentioned, there is skewed distribution of data for customer_id column. However what is the problem if OP will opt for HASH subpartition on customer_id in this situation?
Is it because the number of partitions will be higher i.e.
If you go with simple range partitioning, for 5 months you will have ~150 daily range partitions and with each index the count of partition will gets double, for e.g if you will have 10 indexes, the total partitions will be = ~150 table partition+ (10*150)index partition= 1650 total number of partitions.
If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 8, hash sub-partitions , then the total number of partitions will be = (8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions.
Though there are no theoretical limits to the number of partitions in postgres, there are some serious issues noted in the past with higher number of table partitions. One such is below. Is this the reason?
https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits
Regards
Veem