Search Postgresql Archives

Re: Finding Errors in .csv Input Data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday, February 23, 2011 6:11:16 am Rich Shepard wrote:
> On Tue, 22 Feb 2011, David Johnston wrote:
> > The data and table structure provided do not seem to correlate.
> 
> David,
> 
>    That's the problem. However, they should match since they came from the
> same .mdb file.
> 
> > Regardless, if you changed the delimiter to "|" from "," it is possible
> > that you converted an embedded "," in one of the textual fields into a
> > "|" when you should not have.
> 
>    This is why I worked out a processing flow that avoided this problem in
> _almost_ every case. The 80 remaining rows do not appear to have this
> problem. At least, it's not immediately obvious they do which is why I want
> to learn how to identify where (and why) the data get out of sync with the
> schema.
> 
> > I suggest opening up a testing file (one with the 80 malformed records
> > and 10 to 20 good/control records) in an Excel or equivalent spreadsheet
> > and import/text-to-columns using the "|" delimiter.  You will be able to
> > quickly see rows with extra columns and specifically where those extras
> > are originating.  Then you can decide on how to fix the problem.  I
> > would suggest manually changing each incorrect "|" into a "," as a form
> > of self-punishment for improper data conversion - but whatever works for
> > you.
> 
>    Each of the 80 rows have extra columns, at least one.
> 
>    Here are the same two rows from the original .csv export and the
> transformation:
> 
> 68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N"
> ,"E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER, R
> L","","","","SIERRA
> MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,
> ,"B","G","G","AIR COMPRESSOR TESTED 30 GPM ALSO","","3","MEL MEYER","RT 1
> BOX
> 10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,
> 4369139.23
> 
> 68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.4
> 4|119.77|NV003|M|KAIPER, R L|||SIERRA
> MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR
> COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10
> RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23
> 
>    When I compared the two I saw no difference. Why the latter does not
> want to fit the schema I don't know.
> 
>    I can send the schema again and the whole 80-row file to anyone willing
> to hemp me find the problems.
> 
> Rich

As a quick test try quoting KAIPER, R L -->"KAIPER, R L". If that does not work 
you can send me the 80 line csv file and I will see if anything stands out.

-- 
Adrian Klaver
adrian.klaver@xxxxxxxxx

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux