Re: Faster data load

Jeff Ross <jross@xxxxxxxxxxxxxx> · Thu, 5 Sep 2024 16:21:09 -0600

On 9/5/24 14:14, Lok P wrote:

Hi,

We are having a requirement to create approx 50 billion rows in a 
partition table(~1 billion rows per partition, 200+gb size daily 
partitions) for a performance test. We are currently using ' insert 
into <target table_partition> select.. From <source_table_partition> 
or <some transformed query>;' method . We have dropped all indexes and 
constraints First and then doing the load. Still it's taking 2-3 hours 
to populate one partition. Is there a faster way to achieve this?

Few teammate suggesting to use copy command and use file load instead, 
which will be faster. So I wanted to understand, how different things 
it does behind the scenes as compared to insert as select command? As 
because it only deals with sql engine only.

Additionally, when we were trying to create indexes post data load on 
one partition, it took 30+ minutes. Any possible way to make it faster?

Is there any way to drive the above things in parallel by utilizing 
full database resources?

It's postgres 15.4

Regards
Lok

Try pg_bulkload to load the data--takes a little set up but it is very 
fast.  Do pay attention to the caveats.  For a performance test they 
probably won't be relevant.

https://github.com/ossc-db/pg_bulkload?tab=readme-ov-file

Jeff