Search Postgresql Archives

Re: CSV-bulk import and defaults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 02 January 2011 2:22:14 pm Thomas Schmidt wrote:
>   Hello,
>
> well, I'm new to postgres and this is my post on this list :-)
> Anyway, I've to batch-import bulk-csv data into a staging database (as
> part of an ETL-"like" pocess). The data ought to be read via STDIN,
> however for keeping in simple and stupid, saving it to a file and
> importing afterwards is also an option. Sticking my nose into the docs,
> I noticed that copy[1] as well as pg_import[2] are able to do it.
>
> However, there are some additional columns of the staging table (job id,
> etc.) that have to be set in order to identify imported rows. These
> attributes are not part of the data coming from STDIN (since its
> meta-data) and I see no way for specifying default values for "missing"
> cvs columns. (imho copy and pg_bulkload will use table defaults for
> missing rows - do I miss something?).
>
> Thus - do you have any clue on designing an fast bulk-import for staging
> data?
>
> Thanks in advance,
> Thomas
>
> [1] http://www.postgresql.org/docs/9.0/static/sql-copy.html
> [2] http://pgbulkload.projects.postgresql.org/pg_bulkload.html

Check out pgloader:
http://pgloader.projects.postgresql.org/

In particular:

"user_defined_columns

    Those are special columns not found in the data file but which you want to 
load into the database. The configuration options beginning with udc_ are taken 
as column names with constant values. The following example define the column c 
as having the value constant value for each and every row of the input data 
file.

    udc_c = constant value

    The option copy_columns is used to define the exact columnsList given to 
COPY.

    A simple use case is the loading into the same database table of data coming 
from more than one file. If you need to keep track of the data origin, add a 
column to the table model and define a udc_ for pgloader to add a constant 
value in the database.

    Using user-defined columns require defining copy_columns and is not 
compatible with only_cols usage.
copy_columns

    This options defines the columns to load from the input data file and the 
user defined columns, and in which order to do this. Place here the column 
names separated by commas.

    copy_columns = b, c, d

    This option is required if any user column is defined, and conflicts with 
the only_cols option. It won't have any effect when used in a section where no 
user column is defined.
"

-- 
Adrian Klaver
adrian.klaver@xxxxxxxxx

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux