Markus,
Could you COPY one of your tables out to disk via psql, and then COPY it back into the database, to reproduce this measurement with your real data?
$ psql -c "COPY my_table TO STDOUT" > my_data $ ls my_data 2018792 edgescape_pg_load $ time cat my_data | psql -c "COPY mytable FROM STDIN" real 5m43.194s user 0m35.412s sys 0m9.567s
Also, how much is the disk load, and CPU usage?
When I am loading via the perl (which I've established is a bottleneck), the one CPU core is at 99% for the perl and another is at 30% for a postmaster, vs about 90% for the postmaster when going through psql. The disk load is where I start to get a little fuzzy, as I haven't played with iostat to figure what is "normal". The local drives contain PG_DATA as well as all the log files, but there is a tablespace on the FibreChannel SAN that contains the destination table. The disk usage pattern that I see is that there is a ton of consistent activity on the local disk, with iostat reporting an average of 30K Blk_wrtn/s, which I assume is the log files. Every several seconds there is a massive burst of activity on the FC partition, to the tune of 250K Blk_wrtn/s.
On a table with no indices, triggers and contstraints, we managed to COPY about 7-8 megabytes/second with psql over our 100 MBit network, so here the network was the bottleneck.
hmm, this makes me think that either my PG config is really lacking, or that the SAN is badly misconfigured, as I would expect it to outperform a 100Mb network. As it is, with a straight pipe to psql COPY, I'm only working with a little over 5.5 MB/s. Could this be due to the primary key index updates? Thanks!