Re: Best COPY Performance

"Worky Workerson" <worky.workerson@xxxxxxxxx> · Mon, 23 Oct 2006 11:40:57 -0400

Markus,

Could you COPY one of your tables out to disk via psql, and then COPY it
back into the database, to reproduce this measurement with your real data?

$ psql -c "COPY my_table TO STDOUT" > my_data
$ ls my_data
2018792 edgescape_pg_load
$ time cat my_data | psql -c "COPY mytable FROM STDIN"
real    5m43.194s
user    0m35.412s
sys     0m9.567s

Also, how much is the disk load, and CPU usage?

 When I am loading via the perl (which I've established is a
bottleneck), the one CPU core is at 99% for the perl and another is at
30% for a postmaster, vs about 90% for the postmaster when going
through psql.

The disk load is where I start to get a little fuzzy, as I haven't
played with iostat to figure what is "normal".  The local drives
contain PG_DATA as well as all the log files, but there is a
tablespace on the FibreChannel SAN that contains the destination
table.  The disk usage pattern that I see is that there is a ton of
consistent activity on the local disk, with iostat reporting an
average of 30K Blk_wrtn/s, which I assume is the log files.  Every
several seconds there is a massive burst of activity on the FC
partition, to the tune of 250K Blk_wrtn/s.

On a table with no indices, triggers and contstraints, we managed to
COPY about 7-8 megabytes/second with psql over our 100 MBit network, so
here the network was the bottleneck.

hmm, this makes me think that either my PG config is really lacking,
or that the SAN is badly misconfigured, as I would expect it to
outperform a 100Mb network.  As it is, with a straight pipe to psql
COPY, I'm only working with a little over 5.5 MB/s.  Could this be due
to the primary key index updates?

Thanks!