On Tue, 22 Feb 2011, Adrian Klaver wrote:
We are going to need to see at least a sample of the actual data in one.csv that is causing the problem. You have an off by two error as you suggest, but that could actually have happened earlier in the row. For instance the well_finish_date would insert into lot_no because lot_no is TEXT and the date value at this point is just text. Same with date_cmplt_acc and block_no.
Adrian/Andy, The data came out of Access as comma-and-quote csv. I massaged it in emacs and sed to change the delimiter to a pipe rather than a comma and removed the double quotes. I cannot assume that each of the 80 problem rows suffer from the defect in the same place, so if there's a generic process I can apply it row-by-row. After all, 80 problem rows out of 110,752 is not bad. Here are the schema and the first row, in one.csv. Because I don't know if attachments are stripped off before the message is allowed to be distributed to the list subscribers, I'll just include both here. DDL: The column names are original (except for 'ref' that seems to be a reserved word), but I modifed the data types. CREATE TABLE water_well ( sequence_no TEXT PRIMARY KEY, well_log TEXT, app VARCHAR(20), notice_of_intent VARCHAR(6), waiver_no VARCHAR(30), date_log_rcvd DATE, date_log_rcvd_acc CHAR(1), site_type CHAR(1), work_type CHAR(1), work_type_rmks TEXT, proposed_use CHAR(1), drilling_method CHAR(1), sc TEXT, ha TEXT, twn VARCHAR(3), legal_twn VARCHAR(3), rng VARCHAR(3), legal_rng VARCHAR(3), sec TEXT, sec_quarters TEXT, legal_quarters TEXT, quarters_seq TEXT, ref TEXT, latitude NUMERIC(9,6), longitude NUMERIC(9,6), lat_long_src VARCHAR(5), lat_long_acc CHAR(1), owner_current TEXT, owner_address TEXT, owner_no TEXT, parcel_no TEXT, subdivision_name TEXT, lot_no TEXT, block_no TEXT, well_finish_date DATE, date_cmplt_acc CHAR(1), gravel_packed CHAR(1), depth_seal INTEGER, depth_drilled INTEGER, depth_bedrock INTEGER, aquifer_desc TEXT, depth_cased INTEGER, csng_diameter FLOAT, csng_reductions INTEGER, top_perf INTEGER, bottom_perf INTEGER, perf_intervals INTEGER, static_wl FLOAT, temperature FLOAT, yield FLOAT, drawdown FLOAT, hours_pumped FLOAT, test_method CHAR(1), qual_const_data CHAR(1), qual_lith_data CHAR(1), remarks TEXT, remarks_additional TEXT, contractor_lic_no VARCHAR(8), contractor_name TEXT, contractor_address TEXT, contractor_drlr_no VARCHAR(6), driller_lic_no VARCHAR(6), source_agency TEXT, user_id TEXT, date_entry DATE, update_user_id VARCHAR(16), date_update DATE, edit_status VARCHAR(16), well_start_date DATE, gravel_pack_top INTEGER, gravel_pack_bot INTEGER, utm_x NUMERIC(13,6), utm_y NUMERIC(13,6) ); Here's one.csv: 68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 I hope you're able to see what I keep missing as the source of the problem. Rich -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general