blueaxon@xxxxxxxxx (Paul R) wrote: > I need some advice.. I am new to PG8 - I am working on the project > that needs to load data daily to PostgreSQL from the text file.. the > text file is 40MB comma delimited file with row consisting of 20-30 > fields and couple thousands of rows.. > > so the data needs to be parsed and if the row exists it needs to be > updated, otherwise new record is going to be created. My problem is > with parsing: > > 1-what would be a good approach to do it? is plpgSQL suitable for > this? or should I use PHP or other language to to that and run as a > cron job/service? When I have done this sort of thing, I have used Perl to turn the data into a file in a format that can be readily read in using the PostgreSQL-specific "COPY" command. If you like PHP or Python or Icon/sed/awk/SNOBOL, that's your call; using a language to "parse" the text that you are familiar with is likely to be better than using a language that you aren't familiar with... > 2-how about committing transactions? would there be a problem if it > is done at the end of all the updates? No, no problem at all. When we did the data conversion for the .ORG registry, it involved splitting the data into 50,000 record "chunks", each loaded in using COPY. The prime reason to split at 50,000 was that if problems were encountered, it's a lot easier to edit a file with 50,000 records using vi than it is to edit one with 5 million records. I don't recall needing to use that capability, but it gave everyone confidence to know it was there... -- select 'cbbrowne' || '@' || 'gmail.com'; http://cbbrowne.com/info/ /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match