Search Postgresql Archives

Re: Finding Errors in .csv Input Data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 22 Feb 2011, David Johnston wrote:

The data and table structure provided do not seem to correlate.

David,

  That's the problem. However, they should match since they came from the
same .mdb file.

Regardless, if you changed the delimiter to "|" from "," it is possible that
you converted an embedded "," in one of the textual fields into a "|" when
you should not have.

  This is why I worked out a processing flow that avoided this problem in
_almost_ every case. The 80 remaining rows do not appear to have this
problem. At least, it's not immediately obvious they do which is why I want
to learn how to identify where (and why) the data get out of sync with the
schema.

I suggest opening up a testing file (one with the 80 malformed records and
10 to 20 good/control records) in an Excel or equivalent spreadsheet and
import/text-to-columns using the "|" delimiter.  You will be able to
quickly see rows with extra columns and specifically where those extras
are originating.  Then you can decide on how to fix the problem.  I would
suggest manually changing each incorrect "|" into a "," as a form of
self-punishment for improper data conversion - but whatever works for you.

  Each of the 80 rows have extra columns, at least one.

  Here are the same two rows from the original .csv export and the
transformation:

68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N","E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER, R L","","","","SIERRA MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,,"B","G","G","AIR COMPRESSOR TESTED 30 GPM ALSO","","3","MEL MEYER","RT 1 BOX 10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,4369139.23

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23

  When I compared the two I saw no difference. Why the latter does not want
to fit the schema I don't know.

  I can send the schema again and the whole 80-row file to anyone willing to
hemp me find the problems.

Rich


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux