Re: Load data from a csv file without using COPY

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Mon, 25 Jun 2018 12:52:11 -0700

On 06/25/2018 09:17 AM, Kevin Brannen wrote:
From: Ravi Krishna [mailto:srkrishna@xxxxxxxxx]
Sent: Tuesday, June 19, 2018 4:15 PM
To: Steve Atkins <steve@xxxxxxxxxxx>
Cc: PG mailing List <pgsql-general@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: Load data from a csv file without using COPY

If performance is relevant then your app should probably be using COPY
protocol, not line by line inserts. It's supported by most postgresql
access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in two diff env.

That is not clear from this post:

https://www.postgresql.org/message-id/em2345975f-0c51-42dd-a35f-ff88715e8bbb%40ravi-lenovo

"3. We now have a catch-22 situation.  Should we spend time porting the
app to PG without first verifying
     that PG can perform as well as DB2. In other words, if some sort of
testing rules out PG as a good
     replacement for DB2, why even bother to port.  Of course that does
not prove conclusively that if PG
     passes the test, then it would mean that the app will work just as
fine.  But at least basic test will tell
    that we are not on a wrong path.
4. What I am planning is:
     4.a Get a set of large tables exported as a pipe delimited text
file.
     4.b Load them in both DB2 and PG on a similar h/w
     4.c  Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of
the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I
have read, while PG
      optimizer is very good in handling OLTP, it is not, as yet, as good
in OLAP queries.

"

I read that as can Postgres replace DB2 and not lose performance? That 
would, to me, mean use whatever works best to get the job done.

---

Seems like a ~10 line Perl program could handle this very easily. Use the Text::CSV module to make handling of the input easier. Prepare your insert statement, then once you've pulled each line of input in execute the insert. For slightly better performance, I'd probably use transactions, add a counter, and commit every 10,000 rows (or something like that).

Once you have that working with 1 DB, you can copy your program, change the DBD driver and the connect statement to the other DB and try the other one. Unless you want to be really clever and make the same program do both and pick the DB by a command line switch. :)

HTH,
Kevin

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx