On 2/8/24 1:43 PM, veem v wrote:
On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane <htamfids@xxxxxxxxx
<mailto:htamfids@xxxxxxxxx>> wrote:
<snip>
Should we go for simple daily range partitioning on the
transaction_date column?
This one gets my vote. That and some good indexes.
Hello Greg,
Out of curiosity, As OP mentioned that there will be Joins and also
filters on column Customer_id column , so why don't you think that
subpartition by customer_id will be a good option? I understand List
subpartition may not be an option considering the new customer_ids gets
added slowly in the future(and default list may not be allowed) and also
OP mentioned, there is skewed distribution of data for customer_id
column. However what is the problem if OP will opt for HASH subpartition
on customer_id in this situation?
Is it because the number of partitions will be higher i.e.
If you go with simple range partitioning, for 5 months you will have
~150 daily range partitions and with each index the count of partition
will gets double, for e.g if you will have 10 indexes, the total
partitions will be = ~150 table partition+ (10*150)index partition= 1650
total number of partitions.
If OP goes for , range-hash, and hash will mostly have to be 2^N, so say
8, hash sub-partitions , then the total number of partitions will be =
(8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions.
Though there are no theoretical limits to the number of partitions in
postgres, there are some serious issues noted in the past with higher
number of table partitions. One such is below. Is this the reason?
https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits <https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits>
The issue with partitioning by customer_id is that it won't do much (if
anything) to improve data locality. When partitioning by date, you can
at least benefit from partition elimination *IF* your most frequent
queries limit the number of days that the query will look at. Per the
OP, all queries will include transaction date. Note that does NOT
actually mean the number of days/partitions will be limited (ie, WHERE
date > today - 150 will hit all the partitions), but if we assume that
the majority of queries will limit themselves to the past few days then
partitioning by date should greatly increase data locality.
Also, when it comes to customer partitioning... really what you probably
want there isn't partitioning but sharding.
--
Jim Nasby, Data Architect, Austin TX