Re: suggestion about time based partitioning and hibernate

Marc Millas <marc.millas@xxxxxxxxxx> · Tue, 25 Jul 2023 12:23:42 +0200

On Tue, Jul 18, 2023 at 8:18 AM Luca Ferrari <fluca1978@xxxxxxxxx> wrote:
Dear all,

I'm looking for ideas here, and it could be someone already stepped

into declarative partitioning of an existing database where Hibernate

(a Java ORM) handles the tables.

The situation is as follows:

create table foo( id primary key, a_date date, ... );

Now, the trivial way to partition this would be on a range based on

a_date, so that the primary key of the tables shifts from id to (id,

a_date). One thing that frightens me is that Hibernate does a lot of

per-row lookups by means of the id, so while the partitioning is

probably going to make things more manageable and even faster in some

scenarios, could lead to drawbacks when Hibernate queries by id.

Moreover, hibernate will think id is unique while it is not anymore.

Last but not least, referencing foreign keys are made by Hibernate

thru the id column, and it means that incoming foreign keys to foo

will not be in place anymore.

Now, I know that I can define a composite key in hibernate, in order

to match the effective new data structure, but this requires a huge

rewrite of the application code.

And after all, we are talking about a non-PostgreSQL related piece, so

the problem is not on the PostgreSQL side.

Anyone has already done a partitioning in such scenario?

I am thinking that partitioning on an hash of id could be the only way

to go without having to touch the hibernate side, even if this would

bring up a less balanced partitioned structure. In such case, I mean

partitioning by hash, having a table with 60 millions rows per 50 GB

in size, what would be the rule of thumb to select the number of

partitions (i.e., a suggested modulus)?

I will appreciate any suggestion.

Thanks,

Luca

Hello,
quite often, in such cases, the id is a sequence or something like that.
so, partitioning on id range or a_date range is similar, on a maintenance point of vue. 
and, obviously if you partition by id range, no pk pb.
the difference is the SQL plan to access the data, as to prune partitions, Postgres need to have the partition key in the where clause.
obviously, if your code do access the data thru id... then partition by id.

  Marc MILLAS