Re: How long should it take to insert 200,000 records?

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Tue, 6 Feb 2007 11:40:01 -0500

On 2/6/07, Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> wrote:
On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> I have a pl/pgsql function that is inserting 200,000 records for
> testing purposes.  What is the expected time frame for this operation
> on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> waiting for it to complete, and I'm not sure how many inserts postgres
> is doing per second.

That really depends.  Doing 200,000 inserts as individual transactions
will be fairly slow.  Since PostgreSQL generally runs in autocommit
mode, this means that if you didn't expressly begin a transaction, you
are in fact inserting each row as a transaction.  i.e. this:

I think OP is doing insertion inside a pl/pgsql loop...transaction is
implied here.  For creating test data, generate_series or
insert...select is obviously the way to go.  If that's unsuitable for
some reason, I would suggest RAISE NOTICE every n records so you can
monitor the progress and make sure something is not binding up in a
lock or something like that.  Be especially wary of degrading
performance during the process.

Another common problem with poor insert performance is a RI check to
an un-indexed column.  In-transaction insert performance should be
between 1k and 10k records/second in normal situations, meaning if you
haven't inserted 1 million records inside of an hour something else is
going on.

Generally, insertion performance from fastest to slowest is:
* insert select generate_series...
* insert select
* copy
* insert (),(),()[...] (at least 10 or preferably 100 insertions)
* begin, prepare, n prepared inserts executed, commit
* begin, n inserts, commit
* plpgsql loop, single inserts
* n inserts outside of transaction.

The order of which is faster might not be absolutely set in stone
(copy might beat insert select for example), but the top 4 methods
will always be much faster than the bottom 4.

merlin