Re: How to load data from CSV into a table that has array types in its columns?

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 26 Oct 2022 17:59:49 -0700

On 10/26/22 17:30, Siddharth Jain wrote:
Hello,

Given a long list like this:

1,2
2,4 --> 2 appears once
7,9
8,9
5,3
2,5 --> note 2 appears twice

I want to load it into this table:

create table tbl (
   id integer primary key,
   fids integer[]
)

so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids

My actual dataset is very large and has 100M rows in it. How can it be 
efficiently loaded into postgres?

I think I have to use a program for this and am trying to use the pg 
library that comes with Node.js. I am reading the data in batches of 1M 
or 100k rows for example. I have created a dictionary in Node.js where I 
am storing the hashmap. The part where I am stuck is how to generate the 
SQL command?

1. I want to make one call to the server for the batch of 1M rows, not 
1M calls
2. Some keys in the hashmap might already exist in the database. For 
these keys we want to append to the array
3. Some keys will not exist and for these we want to insert new rows

Can someone help me please? I am using Postgres for the first time.

The best way to deal with importing large batches of data is to use COPY

https://www.postgresql.org/docs/current/sql-copy.html

But that will not play well with modifying the data as you input it.

What I can see doing is:

1) COPY the data into a staging table:

create staging_tbl (id integer, fid integer)

2) Then using SQL statements to move the data to the final table.

As example of one possibility, using ON CONFLICT from here:

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

insert into table tbl select id, array[fid] from staging_table on 
conflict(id) DO UPDATE SET fids = array_append(fids, excluded.fid);

I would test with a smaller example data set to vetify.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx