Re: Savepoints in transactions for speed?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 27, 2012 at 6:26 PM, Steve Atkins <steve@xxxxxxxxxxx> wrote:

On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike.blackwell@xxxxxxx> wrote:

> I need to delete about 1.5 million records from a table and reload it in one transaction.  The usual advice when loading with inserts seems to be group them into transactions of around 1k records.  Committing at that point would leave the table in an inconsistent state.

I'd probably just do the whole thing in one transaction.

Do you have specific reasons you want to avoid a long transaction, or just relying on rules of thumb? Postgresql isn't going to run out of resources doing a big transaction, in the way some other databases will.

Long running transactions will interfere with vacuuming, but inserting a couple of million rows shouldn't take that long.

>  Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for all 1.5 million records, or just add more overhead?


Savepoints are going to increase overhead and have no effect on the length of the transaction. If you want to catch errors and not have to redo the entire transaction, they're great, but that's about it.

> The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here.

COPY works nicely from perl:

$dbh->do("COPY foo FROM STDIN");
$dbh->pg_putcopydata("foo\tbar\tbaz\n");
$dbh->pg_putcopyend();

The details are in DBD::Pg. I use this a lot for doing big-ish (tens of millions of rows) bulk inserts. It's not as fast as you can get, but it's probably as fast as you can get with perl.

Cheers,
  Steve
 
I do this as well - insert a few million rows into a table using the DBI::Pg copy interface.  It works well.

I ended up batching the copies so that each COPY statement only does a few hundred thousand at a time, but it's all one transaction.

The batching was necessary because of an idiosyncrasy of COPY in Pg 8.1: each COPY statement's contents was buffered in a malloc'd space, and if there were several million rows buffered up, the allocated virtual memory could get quite large - as in several GB.  It plus the buffer pool sometimes exceeded the amount of RAM I had available at that time (several years ago), with bad effects on performance.

This may have been fixed since then, or maybe RAM's gotten big enough that it's not a problem.

Dan Franklin 

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux