Search Postgresql Archives

Re: import CSV file to a table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 08, John McKown modulated:
...
> ​I agree. I went with a "pure BASH" approach because it is what the
> user asked for & I wasn't sure what language she might be comfortable
> with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
> Such as a PERL script with writes out another PERL script, based on
> some input files & parameters, then runs the just written PERL script,
> which does the load into a PostgreSQL database (multiple tables). Ya, a
> bit perverted.​
> 

Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux