Re: How to do faster DML

Ron Johnson <ronljohnsonjr@xxxxxxxxx> · Sun, 4 Feb 2024 10:54:54 -0500

On Sun, Feb 4, 2024 at 10:30 AM Lok P <loknath.73@xxxxxxxxx> wrote:
On Sun, Feb 4, 2024 at 8:14 PM Dennis White <dwhite@xxxxxxxxxxxxxxx> wrote:
I'm surprised no one has mentioned perhaps it's a good idea to partition this table while adding the pk. By your own statements the table is difficult to work with as is. Without partitioning the table, row inserts would need to walk the pk index and could be a factor. If this is static table then perhaps that's ok but if not...
Anyway I don't recall what type the ID was or how it's set but i suggest you seriously investigate using it to partition this table into manageable smaller tables. 
Best of luck.

On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro <marcos@xxxxxxxxxx> wrote:
insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;

You can use min/max values grouping them by 10_000_000 records, so you don´t need that offset, then generate commands and run them.

select format('insert into mytable2 select * from mytable1 where i between %s and %s;', max(i), min(i)) from t group by i/10_000_000;

Thank you so much.
You are correct. It was seemingly difficult to operate on this table. Every read query is struggling and so partitioning is something we must have to think of. And hoping that, postgres will be able to handle this scale, with proper partitioning and indexing strategy.

I have a few related questions. 

1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good idea or will it be cumbersome/resource intensive to re-enable the constraints , after persisting all the data in the table?

1. Load the children before attaching them to the parent.
2. Create the child indices, PK and FKs before attaching to the parent.
3. Do step 2 in multiple parallel jobs.  (cron is your friend.)
4. Attach the children to the "naked" (no PK, no FK, no indices) parent.
5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the indices, PK and FK to the parent after step 4.

2)I understand there is no limitation theoretically on the number or size of partitions a table can have in postgres. But I want to know from experts here, from their real life experience, if there exists any such thing which we should take care of before deciding the partitioning strategy, so as to have the soft limit (both with respect to size and number of partitions) obeyed. 
Note:- Currently this table will be around ~1TB in size and will hold Approx ~3billion rows(post removal of duplicates). But as per business need it may grow up to ~45 billion rows in future.

I'd keep it under 200 partitions, but my experience might be outdated.

3)As the size of the table or each partition is going to be very large and this will be a read intensive application,

By PK?  If so, partition by PK.

 compressing the historical partition will help us save the storage space and will also help the read queries performance. So, Can you please throw some light on the compression strategy which we should follow here (considering a daily range partition table based on transaction_date as partition key)?

Regards
Lok