On Wed, Aug 25, 2010 at 08:47:10PM +0200, Wojciech Strzaaaka wrote: > The data set is 9mln rows - about 250 columns 250 columns sounds very strange to me as well! I start to getting worried when I hit a tenth of that. > CPU utilization - 1,2% (half of the one core) > iostat shows writes ~6MB/s, 20% busy > when I run 2 loads in parallel the CPU is split to 2*0,6%, hdd write > ~7MB (almost the same) If you've got indexes set up on the table then I'd expect this sort of behavior, you could try dropping them before the copy and then recreating them afterward. It would be great if PG could do these sorts of bulk index updates automatically! Maybe run the first few tens/hundred changes in the main index and then start logging the rows that will need indexing and bulk process and merge them at the end. Concurrent access seems a bit more complicated, but shouldn't be too bad. The case of a UNIQUE index seems to require a change in behavior. For example, the following are executed concurrently: Client A: COPY foo (id) FROM stdin; Client B: INSERT INTO foo (id) VALUES (1); with A starting before and finishing after B, and A sends a row with id=1. At the moment the behavior would be for A's data to be indexed immediately and hence B's conflicting change would fail. If PG did bulk index merging at the end, this would change to B's succeeding and A's failing when the index was brought up to date. These semantics are still compatible with SQL, just different from before so some code may be (incorrectly) relying on this. I've read discussions from: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php and http://archives.postgresql.org/pgsql-general/2008-01/msg01048.php but not found much recent. It seems to hold together better than the first suggestion. Second post notes that you may be better off working in work_mem batches to help preventing spilling to disk. Sounds reasonable, and if it's OK to assume the new rows will be physically close to each other then they can be recorded as ranges/run length encoded to reduce the chance of spilling to disk for even very large inserts. As per the second post, I'm struggling with BEFORE INSERT triggers as well, their semantics seem to preclude most optimizations. > what's also interesting - table is empty when I start (by truncate) > but while the COPY is working, I see it grows (by \d+ or > pg_total_relation_size) about 1MB per second > what I'd expect it should grow at checkpoints only, not all the > time - am I wrong? AFAIU, it'll constantly grow. -- Sam http://samason.me.uk/ -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general