Search Postgresql Archives

Re: Role for CSV import

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 15 May 2011 11:56:38 -0600
Rob Sargent <robjsargent@xxxxxxxxx> wrote:

>Tarlika Elisabeth Schmitz wrote:
>> I have a database that will be populated solely by CSV import.
>> There are several CSV file formats, all denormalized.
>>
>> I have created interim tables which match the CSV file formats. An
>> insert trigger distributes the data to their appropriate destination
>> tables. The destination tables themselves have insert/update triggers
>> for automated data clean-up. Any unresolvable inconsistencies are
>> reported in a log table.
>>
>You seem to be writing denormalized import records for the sole
>purpose of writing other normalized records.

The data come in denormalized form. I have no influence on the incoming
format. The DB will be populated first with historical data, then by a
constant data stream.

>Have you you looked into
>writing a programme in a relatively high-level jdbc-friendly language
>which reads the csv file, normalizes the data (the code already in
>your triggers) and flushes on every say 1000 independent records? The

The historical data import is a one-off and import performance
practically irrelevant. Even the performance of the daily import is
irrelevant because the total volume will only be 1000 records when
no-one is accessing the DB anyway.

I know it's not just a performance question but the ER model is pretty
simple with only a dozen active tables.

>"clean-up" and logging might also be done by the import app (all
>depending on what's being cleaned up and logged 

The same entities keep on cropping up and they
might get modified over time: 1234 J Smith might become 1234 John
Smith; or a FK reference might change because two entities were
consolidated. Those amendments will have to be applied.

The only hairy bit is one piece of data that comes in textual form,
from which crucial information has to be extracted, converted to
integer and boolean columns, hopefully picking up misspellings etc.


-- 

Best Regards,
Tarlika Elisabeth Schmitz

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux