Re: Bulk Inserts

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 13 Aug 2019 08:08:50 -0700

On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote:
 > If the selects are returning more then one row then you are already
 > doing bulk inserts. If they are returning single rows or you want to
 > batch them then you need some sort of code to do that. Something
 > like(pseudo Python like code):

 > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]

 > for val_batch in attr2_vals:
         BEGIN
         for id in val_batch:
                 insert into tab2 (attr1, attr2) (select attr1, attr2
                  from tab1 where attr2 = id)
          COMMIT

For *EXP 1: inserts with multiple txn:*
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where 
attr2 = 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where 
attr2 = 20);

tab1 has ~6M rows and there are only two values for the attribute attr2 in
tab1 which are evenly distributed. So, yes, I guess I'm already doing 
batching
here.

Also, I ran the following two statements to see if their performances 
are comparable.
While STMT 1 always runs faster in my machine but their performances 
seem to differ
by a couple of seconds at most.

STMT 1: select attr1, attr2 into tab2 from tab1;
STMT 2: insert into tab2 (select attr1, attr2 from tab1);

All I have left is:

select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1, 
attr2 into tab2 from tab1;

That will not create a serial type in the id column though. You can 
attach a sequence to that column. Something like:

1) create sequence tab2_id start <max id + 1> owned by tab2.id;

2) alter table tab2 alter COLUMN id set default nextval('tab2_id');

However adding the serial id column as an ALTER TABLE statement actually 
takes more time
than inserting the tuples, so the combined total time is more than 
double the time taken to insert
the tuples into tab2 without serial id column.

Best,
-SB

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx