Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Craig Ringer wrote:
I'm encountering an odd issue with a bulk import query using PostgreSQL
8.3. After a 400,000 row import into a just-truncated table `booking', a
sequential scan run on the table in the same transaction is incredibly
slow, taking ~ 166738.047 ms. After a:
	`COMMIT; BEGIN;'
the same query runs in 712.615 ms, with almost all the time difference
being in the sequential scan of the `booking' table [schema at end of post].

The table is populated by a complex pl/pgsql function that draws from
several other tables to convert data from another app's format.

You must be having an exception handler block in that pl/pgsql function, which implicitly creates a new subtransaction on each invocation of the exception handler block, so you end up with hundreds of thousands of committed subtransactions. For each row in the seq scan, the list of subtransactions is scanned, to see if the transaction that inserted the row is part of the current top-level transaction. That's fine for a handful of subtransactions, but it gets really slow with large numbers of them, as you've seen. It's an O(n^2) operation, where n is the number of rows inserted, so you'll be in even more trouble if the number of rows increases.

As a work-around, avoid using exception handlers, or process more than 1 row per function invocation. Or COMMIT the transaction, as you did.

For 8.4, it would be nice to improve that. I tested that on my laptop with a similarly-sized table, inserting each row in a pl/pgsql function with an exception handler, and I got very similar run times. According to oprofile, all the time is spent in TransactionIdIsInProgress. I think it would be pretty straightforward to store the committed subtransaction ids in a sorted array, instead of a linked list, and binary search. Or to use a hash table. That should eliminate this problem, though there is still other places as well where a large number of subtransactions will hurt performance.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux