Since bash has been bandied about in this thread I presume awk is available. Here's how I would check just how 'csv'ish the incoming file is.
If this doesn't produce one line you have to suspect quoted
values including commas (or what ever char you choose). then you
need a real csv parser. If just one line, I'ld use cut to get rid of unwanted
columns, then let COPY do it's thing On 03/08/2017 09:13 AM, Karl Czajkowski
wrote:
On Mar 08, John McKown modulated: ...I agree. I went with a "pure BASH" approach because it is what the user asked for & I wasn't sure what language she might be comfortable with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot. Such as a PERL script with writes out another PERL script, based on some input files & parameters, then runs the just written PERL script, which does the load into a PostgreSQL database (multiple tables). Ya, a bit perverted.Well, you could follow a similar meta-programming/code-generating pattern to have the BASH script output a single SQL file to run with psql. You could even generate PL/pgsql code to defer more data processing to the database itself. I think the only robust "pure BASH" approach is to use a temporary table, so you aren't trying to parse CSV content in BASH. Using csvkit sounds good if you can introduce these third-party dependencies. With the temporary table, you can use SQL for most validation or data interrogation, but you need to know at least enough schema information in advance to form the COPY statement. Parsing the CSV header row to plan your work puts you right back to requiring a robust CSV parser unless you can constrain your input scenarios to only handle very trivial headers. If you play games with a defaulting serial column and fixed column names like "id, c1, c2, ..., cN" for the temporary table, you might use the id column as a DB-assigned "row number" during COPY and validation. In this case, you could even tell Postgres there is no header, and then let it parse the header as another data record so you can use SQL statements to determine the actual header names and ordering in the input. But this still requires knowing the column count in advance of the COPY. I also think using something like Python with structured data processing would be wiser, unless you know enough about the schema in advance to avoid any CSV parsing on the client side. Karl |