On 9/8/24 10:45, Peter J. Holzer wrote:
On 2024-09-06 01:44:00 +0530, Lok P wrote:
We are having a requirement to create approx 50 billion rows in a partition
table(~1 billion rows per partition, 200+gb size daily partitions) for a
performance test. We are currently using ' insert into <target table_partition>
select.. From <source_table_partition> or <some transformed query>;' method .
We have dropped all indexes and constraints First and then doing the load.
Still it's taking 2-3 hours to populate one partition.
That seems quite slow. Is the table very wide or does it have a large
number of indexes?
Is there a faster way to achieve this?
Few teammate suggesting to use copy command and use file load instead, which
will be faster.
I doubt that.
I benchmarked several strategies for populating tables 5 years ago and
(for my test data and on our hardware at the time - YMMV) s simple
INSERT ... SELECT was more than twice as fast as 8 parallel COPY
operations (and about 8 times as fast as a single COPY). >
Details will have changed since then (I should rerun that benchmark on
a current system), but I'd be surprised if COPY became that much faster
relative to INSERT ... SELECT.
Yeah they seem to have changed a great deal. Though you are correct in
saying COPY is not faster then INSERT..SELECT
select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
select count(*) from nyc_taxi_duckdb ;
count
---------
2846722
ll -h nyc_taxi.csv
-rw-rw-r-- 1 aklaver aklaver 252M Sep 8 10:54 nyc_taxi.csv
insert into nyc_duplicate select * from nyc_taxi_duckdb ;
INSERT 0 2846722
Time: 7015.072 ms (00:07.015)
truncate nyc_duplicate ;
\copy nyc_duplicate from 'nyc_taxi.csv' with csv header
COPY 2846722
Time: 8760.197 ms (00:08.760)
copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header;
COPY 2846722
Time: 7904.279 ms (00:07.904)
Just to see what the coming attraction offers:
select version();
version
--------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17rc1 (Ubuntu 17~rc1-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
insert into nyc_duplicate select * from nyc_taxi_duckdb ;
INSERT 0 2846722
Time: 5315.878 ms (00:05.316)
\copy nyc_duplicate from 'nyc_taxi.csv' with csv header
COPY 2846722
Time: 10042.129 ms (00:10.042)
copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header;
COPY 2846722
Time: 8422.503 ms (00:08.423)
hp
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx