On 09/16/2011 04:42 PM, Rich Shepard wrote:
On Thu, 15 Sep 2011, Andy Colson wrote:
First you need to trim the \n and spaces:
andy=# insert into junk values (E'GW-22');
INSERT 0 1
andy=# insert into junk values (E'GW-22 \n');
INSERT 0 1
andy=# insert into junk values (E'GW-22 \n');
Andy,
Here's what worked for me:
nevada=# \i junk.sql
CREATE TABLE
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22');
INSERT 0 803
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22\n');
INSERT 0 1409
nevada=# select '['|| rtrim(trim(trailing E'\n' from site_id)) || ']' from junk;
?column? ----------
[GW-22]
[GW-22]
and so on for 2212 rows.
Trim it up:
andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;
If you have a unique index you'll wanna drop it first. Once you get that done, we can remove the dups.
No index on junk; I can remove it from chemistry prior to reinserting the
cleaned rows.
Also, where can I read about the select syntax you use? I find nothing
about it in Rick van der Lans' 4th edition, the most comprehensive language
reference I've read.
Thanks,
Rich
The fine online manual:
http://www.postgresql.org/docs/current/interactive/index.html
Especially the string ops:
http://www.postgresql.org/docs/current/interactive/functions-string.html
Trim it up:
andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;
Andy,
Scrolling through the table with rows ordered by date and chemical I find
no duplicates ... so far. However, what I do find is that the above did not
work:
No, it wasnt supposed to. A select statement builds a new result set and returns it to you, it wont update a table. That select statement was meant as an example for writing an update statement.
Like:
update chemistry set side_id = rtrim(trim(trailing E'\n' from site_id));
If there was a unique index on chemistry(site_id), the above would throw an error, so I was warning you to drop it.
Once the site_id was trimmed, you could then delete the dups, with:
delete from chemistry where site_id = 'GW-22' and ctid <> (select min(ctid) from chemistry site_id = 'GW-22');
Those 11 steps you had... I was thinking two steps. The update and the delete above.
Sorry, I should have been a little more clear, but, at least you got things cleaned up. PG has a huge number of data manipulation functions. If you have to export data out of a database in order to massage it, then that's a failure of a database. PG (and sql) were meant for just this kind of job.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general