Search Postgresql Archives

Re: error-tolerant COPY FROM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



joolz@xxxxxxxxxxxxxxxxxxxxx ("Joolz") writes:

> Hello everyone,
>
> I'm building a postgresql db which will have to get lots of data
> from "the outside" (customers, that is). The db has lots of
> constraints, and I'm sure that our customers will offer lots of
> invalid information. We receive the information in csv format. My
> first thought was to read them into the database with COPY, but
> "COPY stops operation at the first error."
>
> What I need is an import where all valid lines from the csv files
> are read into the db, and I also get a logfile for all invalid
> lines, stating the line number plus the pg error message so I can
> see which constraint was violated.
>
> I can't think of a direct, elegant solution for this, does anyone
> have any suggestions? Thanks a lot!

First, load it into a table that has really, really _weak_
constraints, but which also adds a sequence column at the end that
gets automagically populated.

- The sequence gets you the "line numbers" you want.

- You then run queries that separate the "good" from the "crud."  

This might ultimately turn into having 9 tables:

 - The original data, that you NEVER touch again, as the "pristine"
   form to look at if you find a problem;

 - 3 tables that collect entries with 3 different sorts of problems
   that mandate discarding the data [well, sending it back for
   retreading...]

 - 4 tables that contain entries that could get rewritten in some
   automatic fashion based on 4 policies you discovered

 - 1 "final results" table that aggregates those 4 tables along with
   the entries that were OK, which are fine to use as the resulting data
   import that can safely go into your application.

I'm making up numbers, but you hopefully get the idea...
-- 
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux