Re: Postgresql 14 partitioning advice

Rick Otten <rottenwindfish@xxxxxxxxx> · Fri, 29 Jul 2022 17:44:11 -0400

On Wed, Jul 27, 2022 at 8:55 AM Rick Otten <rottenwindfish@xxxxxxxxx> wrote:
I'm spinning up a new Postgresql 14 database where I'll have to store a couple years worth of time series data at the rate of single-digit millions of rows per day.  Since this has to run in AWS Aurora, I can't use TimescaleDB.

I thought I'd report back some of my findings from testing this week:

I took the same real world two week data set and created identical tables except that I partitioned one by month, one by week, one by day, and one by hour.  I partitioned a little bit into the past and a little bit into the future. I did this on a PG 14.2 RDS instance.  This gave me tables with:
3 partitions, 13 partitions, 90 partitions and 2136 partitions, but otherwise the same data.

Insert times were equivalent.

Then I crafted a query that was one of the main use cases for the data and ran it a bunch of times.

I noticed a significant degradation in performance as the number of partitions increased.  The jump from 13 to 90, in particular, was very steep.  It didn't matter what I set work_mem or other tunables to.  I dug deeper...

Surprising to me was if you partition on a `timestamp with timezone` column, call it "ts":
If your where clause looks like
```
where ts at time zone 'UTC' > '2022-07-01 00:00'::timestamp
```
you will NOT get partition pruning and it will sequence scan.
However if you change it to (with an appropriately adjusted right hand side if necessary):
```
where ts > '2022-07-01 00:00'::timestamp
```
It will do partition pruning and will index scan.

When I made that change the query performance was equivalent regardless of which number of partitions I had in play.
I did a quick test and this happens on a regular timestamp index on a regular table as well.

The other problem I ran into, which I'm still building a test case for and I fear might be a bug if I can easily reproduce it,
is if I did the original select in a CTE, and then did a sort outside of the CTE, even though the CTE found 0 rows, the database
still spent a _ton_ of time sorting those 0 rows:
```
   ->  Sort  (cost=70.03..72.53 rows=1000 width=112) (actual time=84848.452..84848.453 rows=0 loops=1)
```
Once I can reproduce this on test data I'll be able to pin down more closely what is happening and tell if I'm just reading
the explain plan wrong or if something is broken.  It was getting mixed up with the lack of pruning/index usage problem.

I'll report back again next week.  Anyway it is looking to me like it doesn't really matter (within reason) from a performance
perspective how many partitions we use for our data set and query patterns.  We should be able to pick the most convenient
from an archiving and data management perspective instead.