On 10/21/06, Worky Workerson <worky.workerson@xxxxxxxxx> wrote:
What is the best COPY performance that you have gotten on a "normal" table? I know that this is question is almost too general, but it might help me out a bit, or at least give me the right things to tweak. Perhaps the question can be rewritten as "Where are the major bottlenecks in a COPY?" or "How can I compute the max theoretical COPY performance for my hardware?". The two subquestions that I have from this are: -Are my ETL scripts (perl) maximizing the database COPY speeds? -Can I tweak my DB further to eek out a bit more performance? I'm using perl to ETL a decent sized data set (10 million records) and then loading it through perl::DBI's copy. I am currently getting between 10K and 15K inserts/second. I've profiled the ETL scripts a bit and have performance-improved a lot of the code, but I'd like to determine whether it makes sense to try and further optimize my Perl or count it as "done" and look for improvements elsewhere. I ran trivial little insert into a table with a single integer row and came close to 250K inserts/second using psql's \copy, so I'm thinking that my code could be optimized a bit more, but wanted to check around to see if that was the case. I am most interested in loading two tables, one with about 21 (small) VARCHARs where each record is about 200 bytes, and another with 7 INTEGERs, 3 TIMESTAMPs, and 1 BYTEA where each record is about 350 bytes.
indexes/keys? more memory for sorting during index creation can have a dramatic affect on bulk insert performance. check for pg_tmp folders popping up during copy run.
I have implemented most of the various bits of PG config advice that I have seen, both here and with a some googling, such as: wal_buffers=128 checkpoint_segments=128 checkpoint_timeout=3000 Software: PG 8.1.3 on RHEL 4.3 x86_64 Hardware: Quad Dual-core Opteron, Fibre Channel SAN with 256M BBC
for table light on indexes, 10-15k for copy is pretty poor. you can get pretty close to that with raw inserts on good hardware. I would suggest configuirng your perl script to read from stdin and write to stdout, and pipe it to psql using copy from stdin. then just benchmark your perl script redirecting output to a file. merlin