Re: Improve COPY performance for large data sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 10 sept. 08 à 19:16, Bill Moran a écrit :
There's a program called pgloader which supposedly is faster than copy.
I've not used it so I can't say definitively how much faster it is.

In fact pgloader is using COPY under the hood, and doing so via a network connection (could be unix domain socket), whereas COPY on the server reads the file content directly from the local file. So no, pgloader is not good for being faster than copy.

That said, pgloader is able to split the workload between as many threads as you want to, and so could saturate IOs when the disk subsystem performs well enough for a single CPU not to be able to overload it. Two parallel loading mode are supported, pgloader will either hav N parts of the file processed by N threads, or have one thread read and parse the file then fill up queues for N threads to send COPY commands to the server.

Now, it could be that using pgloader with a parallel setup performs better than plain COPY on the server. This remains to get tested, the use case at hand is said to be for hundreds of GB or some TB data file. I don't have any facilities to testdrive such a setup...

Note that those pgloader parallel options have been asked by PostgreSQL hackers in order to testbed some ideas with respect to a parallel pg_restore, maybe re-explaining what have been implemented will reopen this can of worms :)

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjINB0ACgkQlBXRlnbh1bmhkgCgu4TduBB0bnscuEsy0CCftpSp
O5IAoMsrPoXAB+SJEr9s5pMCYBgH/CNi
=1c5H
-----END PGP SIGNATURE-----


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux