Hi. I've only been using PostgreSQL properly for a week or so, so I apologise if this has been covered numerous times, however Google is producing nothing of use. I'm trying to import a large amount of legacy data (billions of denormalised rows) into a pg database with a completely different schema, de-duplicating bits of it on-the-fly while maintaining a reference count. The procedures to do this have proven easy to write, however the speed is not pretty. I've spent some time breaking down the cause and it's come down to a simple UPDATE as evidenced below: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1); CREATE OR REPLACE FUNCTION test() RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..10000 LOOP UPDATE foo SET b=b+1 WHERE a=1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; When run individually, this function produces the following timing: Time: 1912.593 ms Time: 1937.183 ms Time: 1941.607 ms Time: 1943.673 ms Time: 1944.738 ms However, when run inside a transaction (to simulate a function doing the same work) I get this: START TRANSACTION Time: 0.836 ms Time: 1908.931 ms Time: 5370.110 ms Time: 8809.378 ms Time: 12274.294 ms Time: 15698.745 ms Time: 19218.389 ms There is no disk i/o and the postgresql process runs 100% cpu. Server is amd64 FreeBSD 8-STABLE w/16GB RAM running postgresql 9.0.3 from packages Looking at the timing of real data (heavily grouped), it seems the speed of UPDATEs can vary dependent on how heavily updated a row is, so I set out to produce a test case: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1),(2,1),(3,1),(4,1); CREATE OR REPLACE FUNCTION test(int) RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..10000 LOOP UPDATE foo SET b=1 WHERE a=$1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; START TRANSACTION; SELECT test(1); Time: 1917.305 ms SELECT test(2); Time: 1926.758 ms SELECT test(3); Time: 1926.498 ms SELECT test(1); Time: 5376.691 ms SELECT test(2); Time: 5408.231 ms SELECT test(3); Time: 5403.654 ms SELECT test(1); Time: 8842.425 ms SELECT test(4); Time: 1925.954 ms COMMIT; START TRANSACTION; SELECT test(1); Time: 1911.153 ms As you can see, the more an individual row is updated /within a transaction/, the slower it becomes for some reason. Unfortunately in my real-world case, I need to do many billions of these UPDATEs. Is there any way I can get around this without pulling my huge source table out of the database and feeding everything in line-at-a-time from outside the database? Thanks. -- The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Critical Software Ltd. reserves the right to monitor and record e-mail messages sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring its effective operation. Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. ------------------------------------------------------------ This message has been scanned for security threats by iCritical. For further information, please visit www.icritical.com ------------------------------------------------------------ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance