Re: URGENT: temporary table not recognized?

Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> · Fri, 06 Jan 2012 16:44:12 -0800

On 01/06/2012 03:55 PM, Phoenix Kiula wrote:
...
In general, when you have data scrubbing issues like this, grep/sed/awk/...
are your friends. Clean it up then import it.
Thanks Steve.

The file has 350 million lines. Sed, Awk etc are a little painful when
the file is 18GB witht hat many lines.

I'd want Postgresql to ignore the line altogether when something is
missing. Is this an option we can use, or are rules hoisted on us?

I've found grep, sed and friends to be quite effective and proper 
pre-cleaning to have a relatively minor impact on performance. Done 
properly, you will just be piping the data through a very simple 
grep/sed/awk/... into psql. No extra disk-reads, minimal memory use and 
a bit of CPU. And you will be in charge of deciding how suspect data is 
handled.

If by "rules hoisted" you mean "will PostgreSQL make arbitrary and 
possibly incorrect assumptions to attempt to force bad data into a 
table" then the answer is "no". In fact, it has become more and more 
picky over time. Trust me, at some point you will thank it for doing so.

For example the following used to "work":
select current_date < 2020-01-01;

But it returned "false" which was probably not what the user wanted. 
(2020-01-01 is the integer 2018 which PostgreSQL interpreted as being 
less than current_date). But it sure looks OK at first glance when you 
really meant '2020-01-01'::date.

In current versions, that statement will throw an error just at 
PostgreSQL does with dates like February 31 and a variety of other 
things that certain other DBMS' deem good.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general