Search Postgresql Archives

Re: Finding Errors in .csv Input Data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/22/2011 07:10 PM, Rich Shepard wrote:
I'm sure many of you have solved this problem in the past and can offer
solutions that will work for me. The context is a 73-column postgres table
of data that was originally in an Access .mdb file. A colleague loaded the
file into Access and wrote a .csv file for me to use since we have nothing
Microsoft here. There are 110,752 rows in the file/table. After a lot of
cleaning with emacs and sed, the copy command accepted all but 80 rows of
data. Now I need to figure out why postgres reports them as having too many
columns.

Starting to work with a single row, I first compared by cleaned row to the
raw .csv from the Access output. They match column-by-column. Then I copied
the schema to a text file and started comparing the .csv data
column-by-column. While this looks OK to me, postgres doesn't like it. For
example, I get this error message:

nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
ERROR: value too long for type character(1)
CONTEXT: COPY water_well, line 1, column gravel_packed: "106"

Yet, the column comparison for gravel_packed and surrounding attributes
does not show this:

lot_no TEXT, |
block_no TEXT, |
well_finish_date DATE, 11/15/1948|
date_cmplt_acc CHAR(1), D|
gravel_packed CHAR(1), |
depth_seal INTEGER, |
depth_drilled INTEGER, 106|
depth_bedrock INTEGER, |
aquifer_desc TEXT, |

Notice the NULL value for gravel_packed, while the "106" is for
depth_drilled, two columns later.

I'm stymied and wonder if there's a tool I can use to fix these 80 rows so
the copy command will accept them.

Rich


Can we see a few lines of one.csv?  If we ignore the CONTEXT line, and just look at the error "too long for char(1)", it look like only two columns to think about date_cmplt_acc and gravel_packed.  Sure there is no extra spaces or tabs or weirdness in the file for those two columns?

You might also consider dumping out insert statements.  Might be a little slower, but simpler to debug.

Did access dump out comma separated with quoted fields?  I've done this on several occasions and never had to do any cleaning.  But... I also use perl to split the csv and fire off inserts/copys/updates/etc.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux