Re: large numbers of inserts out of memory strategy

Rory Campbell-Lange <rory@xxxxxxxxxxxxxxxxxx> · Wed, 29 Nov 2017 19:37:04 +0000

On 28/11/17, Rob Sargent (robjsargent@xxxxxxxxx) wrote:
> 
> On 11/28/2017 10:50 AM, Ted Toth wrote:
> > On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@xxxxxxxxx> wrote:
> > > > On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@xxxxxxxxx> wrote:
> > > > 
> > > > I'm writing a migration utility to move data from non-rdbms data
> > > > source to a postgres db. Currently I'm generating SQL INSERT
> > > > statements involving 6 related tables for each 'thing'. With 100k or
> > > > more 'things' to migrate I'm generating a lot of statements and when I
> > > > try to import using psql postgres fails with 'out of memory' when
> > > > running on a Linux VM with 4G of memory. If I break into smaller
> > > > chunks say ~50K statements then thde import succeeds. I can change my
> > > > migration utility to generate multiple files each with a limited
> > > > number of INSERTs to get around this issue but maybe there's
> > > > another/better way?

> > > what tools / languages ate you using?

> > I'm using python to read binary source files and create the text files
> > contains the SQL. Them I'm running psql -f <file containing SQL>.

> If you're going out to the file system, I would use COPY of csv files (if
> number of records per table is non-trivial).  Any bulk loading python
> available?

psycopg2 has a copy_from function and (possibly pertinent in this case) a
copy_expert function which allows the read buffer size to be specified.