Search Postgresql Archives

Re: general questions postgresql performance config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26/01/2010 12:15 AM, Dino Vliet wrote:

5) Other considerations?


To get optimal performance for bulk loading you'll want to do concurrent data loading over several connections - up to as many as you have disk spindles. Each connection will individually be slower, but the overall throughput will be much greater.

Just how many connections you'll want to use depends on your I/O subsystem and to a lesser extent your CPU capacity.

Inserting data via multiple connections isn't something every data loading tool supports, so make sure to consider this carefully.


Another consideration is how you insert the data. It's vital to insert your data in large transaction-batched chunks (or all in one transaction) ; even with synchronized_commit = off you'll still see rather poor performance if you do each INSERT in its own transaction. Doing your inserts as prepared statements where each INSERT is multi-row valued will help too.

Even better is to use COPY to load large chunks of data. libpq provides access to the COPY interface if you feel like some C coding. The JDBC driver (dev version only so far) now provides access to the COPY API, so you can also bulk-load via Java very efficiently now.

If your data needs little/no transformation and is externally validated you can look into pg_bulkload as an alternative to all this.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux