Re: Newbie: PG8 and text file parsing

Christopher Browne <cbbrowne@xxxxxxx> · Sat, 19 Feb 2005 07:54:32 -0500

blueaxon@xxxxxxxxx (Paul R) wrote:
> I need some advice.. I am new to PG8 - I am working on the project
> that needs to load data daily to PostgreSQL from the text file.. the
> text file is 40MB comma delimited file with row consisting of 20-30
> fields and couple thousands of rows..
>
> so the data needs to be parsed and if the row exists it needs to be
> updated, otherwise new record is going to be created. My problem is
> with parsing:
>
> 1-what would be a good approach to do it? is plpgSQL suitable for
> this? or should I use PHP or other language to to that and run as a
> cron job/service?

When I have done this sort of thing, I have used Perl to turn the data
into a file in a format that can be readily read in using the
PostgreSQL-specific "COPY" command.

If you like PHP or Python or Icon/sed/awk/SNOBOL, that's your call;
using a language to "parse" the text that you are familiar with is
likely to be better than using a language that you aren't familiar
with...

> 2-how about committing transactions?  would there be a problem if it
> is done at the end of all the updates?

No, no problem at all.

When we did the data conversion for the .ORG registry, it involved
splitting the data into 50,000 record "chunks", each loaded in using
COPY.  The prime reason to split at 50,000 was that if problems were
encountered, it's a lot easier to edit a file with 50,000 records
using vi than it is to edit one with 5 million records.  I don't
recall needing to use that capability, but it gave everyone confidence
to know it was there...
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://cbbrowne.com/info/
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match