Search Postgresql Archives

Re: Finding Errors in .csv Input Data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 22 Feb 2011, Adrian Klaver wrote:

We are going to need to see at least a sample of the actual data in
one.csv that is causing the problem. You have an off by two error as you
suggest, but that could actually have happened earlier in the row. For
instance the well_finish_date would insert into lot_no because lot_no is
TEXT and the date value at this point is just text. Same with
date_cmplt_acc and block_no.

Adrian/Andy,

  The data came out of Access as comma-and-quote csv. I massaged it in emacs
and sed to change the delimiter to a pipe rather than a comma and removed
the double quotes.

  I cannot assume that each of the 80 problem rows suffer from the defect in
the same place, so if there's a generic process I can apply it row-by-row.
After all, 80 problem rows out of 110,752 is not bad.

  Here are the schema and the first row, in one.csv. Because I don't know if
attachments are stripped off before the message is allowed to be distributed
to the list subscribers, I'll just include both here.

DDL:

  The column names are original (except for 'ref' that seems to be a
reserved word), but I modifed the data types.

CREATE TABLE water_well (
  sequence_no TEXT PRIMARY KEY,
  well_log TEXT,
  app VARCHAR(20),
  notice_of_intent VARCHAR(6),
  waiver_no VARCHAR(30),
  date_log_rcvd DATE,
  date_log_rcvd_acc CHAR(1),
  site_type CHAR(1),
  work_type CHAR(1),
  work_type_rmks TEXT,
  proposed_use CHAR(1),
  drilling_method CHAR(1),
  sc TEXT,
  ha TEXT,
  twn VARCHAR(3),
  legal_twn VARCHAR(3),
  rng VARCHAR(3),
  legal_rng VARCHAR(3),
  sec TEXT,
  sec_quarters TEXT,
  legal_quarters TEXT,
  quarters_seq TEXT,
  ref TEXT,
  latitude NUMERIC(9,6),
  longitude NUMERIC(9,6),
  lat_long_src VARCHAR(5),
  lat_long_acc CHAR(1),
  owner_current TEXT,
  owner_address TEXT,
  owner_no TEXT,
  parcel_no TEXT,
  subdivision_name TEXT,
  lot_no TEXT,
  block_no TEXT,
  well_finish_date DATE,
  date_cmplt_acc CHAR(1),
  gravel_packed CHAR(1),
  depth_seal INTEGER,
  depth_drilled INTEGER,
  depth_bedrock INTEGER,
  aquifer_desc TEXT,
  depth_cased INTEGER,
  csng_diameter FLOAT,
  csng_reductions INTEGER,
  top_perf INTEGER,
  bottom_perf INTEGER,
  perf_intervals INTEGER,
  static_wl FLOAT,
  temperature FLOAT,
  yield FLOAT,
  drawdown FLOAT,
  hours_pumped FLOAT,
  test_method CHAR(1),
  qual_const_data CHAR(1),
  qual_lith_data CHAR(1),
  remarks TEXT,
  remarks_additional TEXT,
  contractor_lic_no VARCHAR(8),
  contractor_name TEXT,
  contractor_address TEXT,
  contractor_drlr_no VARCHAR(6),
  driller_lic_no VARCHAR(6),
  source_agency TEXT,
  user_id TEXT,
  date_entry DATE,
  update_user_id VARCHAR(16),
  date_update DATE,
  edit_status VARCHAR(16),
  well_start_date DATE,
  gravel_pack_top INTEGER,
  gravel_pack_bot INTEGER,
  utm_x NUMERIC(13,6),
  utm_y NUMERIC(13,6)
);

  Here's one.csv:

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23

  I hope you're able to see what I keep missing as the source of the
problem.

Rich

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux