On 12/19/2012 01:06 PM, Kirk Wythers
wrote:
Thanks for the reply Steve. These suggestions are new to me, so I'd like to rephrase them back to you in order to make sure I understand the bits and details.You have a mixture of bash and PostgreSQL/psql issues. Bash first: You don't need to wrap the inputfile glob in quotes. Your filespec will be expanded to a list of the files matching your pattern. I'm assuming that you have not populated your directory with filenames that contain spaces. If so, either correct it or start Googling for how to deal with lists of space-containing file names. Also, you will have problems with using single-quotes within a string delimited by single-quotes. Combine that with the psql problem where it appears you have to specify delimiter before CSV (though you probably don't need to specify delimiter at all if your delimiter is a comma) and things will blow up. Try: .... -c "\copy rawinput FROM stdin with delimiter as ',' NULL AS 'NA ' CSV HEADER"
I had assumed you had fully duplicated rows in which you could simply do; INSERT INTO realtable SELECT DISTINCT * FROM rawinput; (assuming realtable and rawinput have exactly the same structure and column ordering, of course) If that is not the case, you will need to update your select statement to return one from among your duplicated IDs based on your desired criteria. Something like but not exactly. The syntax is CREATE UNLOGGED TABLE... Except for UNLOGGED before TABLE it's exactly like a regular create-table statement. BEWARE: Unlogged is *NOT* a magic go-fast directive that comes at no cost. In normal tables data is written to the transaction log allowing the database to recover in the event of an unplanned shutdown. Unlogged tables, as the name implies, skip the writing of the write-ahead log gaining speed at the expense of data safety. For bulk-load operations that can be redone if they fail this is a reasonable approach. For normal operations where you expect your data to be well protected it is not. Depending on a variety of factors the speed improvement you will see ranges from "minimal" to "substantial" I would make rawinput exactly the same as your real table except without indexes or constraints. The easiest way is probably: CREATE UNLOGGED TABLE rawinput as SELECT * FROM realtable limit 0; Cheers, Steve |