Search Postgresql Archives

importing a messy text file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a 56GB textfile that i want to import into postgres.
The file is tab delimited and not quoted.
I deleted the header with the column names (using sed) so that i could use COPY with the non-csv text type (because some of the text values contain quotes).

I had some minor trouble with the file which i managed, but now i have one where i can't think of a solution, even though it seems so simple.

The problem is this:
There is a tab after the last column, in many but not all records.
When i ran into the extra tab i added a dummy column in the destination table but now COPY thows an error because the data for the dummy column is missing (on record ~275K of about 150M).

The file is too big to edit by hand and anyway it would probably not be feasible to manually add tabs for every record that misses one, although i don't know how many it would be.

I realize that there could be other showstoppers in the file, like missing tabs in the middle or extra tabs in the middle, but i would like to try and get this fixed.

Maybe it would be feasible to add every record as 1 value and then splitting those into columns using postgres text processing.
Or maybe there is an (undocumented?) option in copy or \copy to ignore extra columns.
Or maybe there is some no-sql software where i can import this and then structure the data before i pass it to postgres..

Do you have any tips, please?

Cheers,


--
Willy-Bas Loos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux