How To: LARGE html text or csv file COPY FROM?

Lou Picciano <loupicciano@xxxxxxxxxxx> · Wed, 15 Sep 2010 13:03:16 +0000 (UTC)

Hello Friends,
We're trying something for the first time: A COPY into a database, from a TEXT (or CSV) file containing one really, really, big field of html.

The field happens to be content of complete webpages, which we then need to later analyze, slice, dice, etc. - so it's verbatim html, with all the carriage returns, spaces, linefeeds(?) and double quotes included!

Problem is: With the very first record, the COPY commands hiccups with: missing data from column error.
in CSV mode, it's 'extra data after last expected column'  (yes, using different input files for test).

Both errors above make sense to me; COPY is running into either a cr or a tab character in each case.

Q: Is there way to handle this directly, as a PG import? 

Meanwhile, we're off into using grep/gawk to remove all carriage returns in the field?

TIA for any help, inspiration, recipes (or time in the stocks).     Lou