On 2018-10-10 17:19:50 -0400, Ravi Krishna wrote: > > On Oct 10, 2018, at 17:18 , Andres Freund <andres@xxxxxxxxxxx> wrote: > > On October 10, 2018 2:15:19 PM PDT, Ravi Krishna <srkrishna1@xxxxxxx> wrote: > >> If I have a large file with say 400 million rows, can I first split it > >> into 10 files of 40 million rows each and then fire up 10 different > >> COPY sessions , each reading from a split file, but copying into the > >> same table. I thought not. It will be great if we can do this. > > > > Yes, you can. > > > Thank you. Let me test it and see the benefit. We have a use case for this. You should of course test this on your own hardware with your own data, but here are the results of a simple benchmark (import 1 million rows into a table without indexes via different methods) I ran a few weeks ago on one of our servers: https://github.com/hjp/dbbench/blob/master/import_pg_comparison/results/claudrin.2018-09-22/results.png y axis is rows per second. x axis are different runs, sorted from slowest to fastest (so 2 is the median). As you can see it doesn't parallelize perfectly: 2 copy processes are only about 50 % faster than 1, and 4 are about 33 % faster than 2. But there is a still quite a respectable performance boost. hp PS: The script is of course in the same repo, but I didn't include the test data because I don't think I'm allowed to include that. -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature