Re: Best COPY Performance

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Sat, 21 Oct 2006 06:41:14 +0530

On 10/21/06, Worky Workerson <worky.workerson@xxxxxxxxx> wrote:
What is the best COPY performance that you have gotten on a "normal" table?

I know that this is question is almost too general, but it might help
me out a bit, or at least give me the right things to tweak.  Perhaps
the question can be rewritten as "Where are the major bottlenecks in a
COPY?" or "How can I compute the max theoretical COPY performance for
my hardware?".  The two subquestions that I have from this are:
  -Are my ETL scripts (perl) maximizing the database COPY speeds?
  -Can I tweak my DB further to eek out a bit more performance?

I'm using perl to ETL a decent sized data set (10 million records) and
then loading it through perl::DBI's copy.  I am currently getting
between 10K and 15K inserts/second.  I've profiled the ETL scripts a
bit and have performance-improved a lot of the code, but I'd like to
determine whether it makes sense to try and further optimize my Perl
or count it as "done" and look for improvements elsewhere.

I ran trivial little insert into a table with a single integer row and
came close to 250K inserts/second using psql's \copy, so I'm thinking
that my code could be optimized a bit more, but wanted to check around
to see if that was the case.

I am most interested in loading two tables, one with about 21 (small)
VARCHARs where each record is about 200 bytes, and another with 7
INTEGERs, 3 TIMESTAMPs, and 1 BYTEA where each record is about 350
bytes.

indexes/keys?  more memory for sorting during index creation can have
a dramatic affect on bulk insert performance.  check for pg_tmp
folders popping up during copy run.

I have implemented most of the various bits of PG config advice that I
have seen, both here and with a some googling, such as:

 wal_buffers=128
 checkpoint_segments=128
 checkpoint_timeout=3000

Software:  PG 8.1.3 on RHEL 4.3 x86_64
Hardware: Quad Dual-core Opteron, Fibre Channel SAN with 256M BBC

for table light on indexes, 10-15k for copy is pretty poor.  you can
get pretty close to that with raw inserts on good hardware. I would
suggest configuirng your perl script to read from stdin and write to
stdout, and pipe it to psql using copy from stdin.  then just
benchmark your perl script redirecting output to a file.

merlin