Re: How to do an UPDATE for all the fields that do NOT break a constraint?

Matthias Karlsson <matthias@xxxxxxx> · Mon, 26 Jan 2009 15:45:15 +0100

On Mon, Jan 26, 2009 at 2:53 PM, Phoenix Kiula <phoenix.kiula@xxxxxxxxx> wrote:
> On Mon, Jan 26, 2009 at 9:45 PM, Matthias Karlsson <matthias@xxxxxxx> wrote:
>> On Mon, Jan 26, 2009 at 2:09 PM, Phoenix Kiula <phoenix.kiula@xxxxxxxxx> wrote:
>>> I wonder if this is an SQL limitation or something I'm missing in the
>>> PG manual, but I need to run an update on my database (to replace the
>>> value of a column to match a new design structure).
>>>
>>> Due to the new business logic, the  replaced value of a field may end
>>> up being already present in the database in another record. This leads
>>> to unique key violations when I run the update.
>>>
>>> My question: I don't mind if the update transaction skips the records
>>> where the key would be violated (this preservation is in fact what we
>>> want) but these are only about 2% of the overall updatable records.
>>>
>>> Is there anyway to make the transaction go through with the remaining
>>> 98% of the update SQL which will in fact NOT violate the unique
>>> constraint?
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>
>> You could always extend your update statement to include an additional
>> check to see if there are already rows present with the same value in
>> the field you are talking about.
>>
>> // Matthias
>>
>
>
>
> Thanks Matthias, but this seems a little recursive to me and I don't
> know how to do the SQL.
>
> Here is my SQL thus far. The table is "testimonials". Basically the
> column "user_alias" needs to be replaced to delete any mention of a
> user's "api_key". Both of these are fields in the same table, hence
> the replace logic below.
>
> As you will also see, based on our business logic, I have already kept
> all the related IDs in a separate small table called
> "testimonials_temp". This should speed up the process quite a bit
> because instead of going through 5 million IDs, we just loop through
> around 400,000.
>
>
>
> update testimonials
> set user_alias = replace(user_alias, '-'||api_key, '')
> where
>  id in (select id from testimonials_temp)
> ;
>
>
>
> The problem is that after being replaced like that the "user_alias"
> column has a problem, because some user_aliases already exist. How
> should I add a check condition recursively? I tried this:
>
>
> update testimonials
> set user_alias = replace(user_alias, '-'||api_key, '')
> where
>      id in (select id from testimonials_temp)
>  and replace(user_alias, '-'||api_key, '') not in (select user_alias
> from links where user_alias = ?????))
> ;
>
>
>
> Hope I have explained this clearly. Would appreciate any ideas!
>

My idea was very similar to the SQL at the end of your post. Wouldn't
something like this work?

update testimonials u
set u.user_alias = replace(u.user_alias, '-'||api_key, '')
where
u.id in (select id from testimonials_temp) and
not exists (select id testimonials where user_alias =
replace(u.user_alias, '-'||api_key, '')))
?

Not sure if this exact SQL is correct, but in your not in expression,
you just need to make sure to refer to the user_alias of the current
row being updated.

// Matthias

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general