Re: How to do faster DML

Lok P <loknath.73@xxxxxxxxx> · Tue, 6 Feb 2024 09:27:38 +0530

On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane <htamfids@xxxxxxxxx> wrote:
On Sun, Feb 4, 2024 at 3:52 PM Lok P <loknath.73@xxxxxxxxx> wrote:
This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.

In that case, and based on the numbers you provided, daily partitioning seems a decent solution.

What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?

There is no compression strategy, per se. There are ways Postgres internally compresses the data (see "TOAST"), but in general, the table is either there or not. If space is a concern you should examine if you truly need 127 columns, make sure they are arranged correctly (search for 'postgres column tetris'), and move unused and older partitions elsewhere - like a separate cheaper Postgres server, or something offline.

With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach.

For an initial load, this is fine, if you do things very carefully. I'd build the child table indexes post load but still feed things into the main table as an initial tradeoff, but YMMV.

Just looked back and saw this is actually Aurora, not Postgres. Most of the advice on this thread should still apply, but be aware that things are not the same as Postgres, especially at the storage layer. For all the money you are giving them, don't forget to bug them as well.

Cheers,
Greg

Thank you so much Greg. That helps.

So when you said "In that case, and based on the numbers you provided, daily partitioning seems a decent solution."
, does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries?

And yes I think you are spot on in regards to the storage. Actually we loaded csv files which were having size summed to ~200GB and I was wondering howcome the table size becomes ~1TB when that data is loaded to the database. I was not aware about the "column tetris" and how postgres padding additional spaces to the column data based on subsequent columns type, still trying to digest the concept. Though we have many columns with NULL values in them. Will try to reorder the column in the table and hopefully , we will be able to really reduce the storage space through that. Will raise a case too.

Regards
Lok