Search Postgresql Archives

Re: Inexplicable duplicate rows with unique constraint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That's why I created a virtual_string function to squeeze out everything but alpha characters and numbers 0-9 from any varchar or text columns that I want to use as business key columns. For example, if I have a column named job_name, I will have a companion column named v_job_name. The v_ column is to replicate Oracle's virtual column, since postgres doesn't have it. You don't put any values in the v_ column directly. I simply have a trigger on insert or update to put the value in the v_job_name column using the virtual_string(new.job_name) function. It's the v_job_name column that use in my unique constraint so that I avoid any unexpected sorting. Meanwhile, my job_name column is still human readable with whatever characters I want to see, including diacritics.

Here is my function, if you want to try it out:

create or replace function store.virtual_string(string_in text)
returns text as
$body$
	declare
		l_return text;
	begin
l_return := regexp_replace (lower(unaccent(string_in)),'[^0-9a-z]','','g');
		return l_return;
	end;
$body$
  language plpgsql volatile security definer
;

Sue

---
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Susan E Hurst
Principal Consultant
Brookhurst Data LLC
Email: susan.hurst@xxxxxxxxxxxxxxxxxx
Mobile: 314-486-3261

On 2020-01-16 11:48, Tom Lane wrote:
Richard van der Hoff <richard@xxxxxxxxxx> writes:
On 16/01/2020 17:12, Magnus Hagander wrote:
See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
which linux distros updated when.

It seems like a plausible explanation but it's worth noting that all the
indexed data here is (despite being in text columns), plain ascii. I'm
surprised that a change in collation rules would change the sorting of
such strings, and hence that it could lead to this problem. Am I naive?

Unfortunately, strings containing punctuation do sort differently
after these changes, even with all-ASCII data.  The example given
on that wiki page demonstrates this.

RHEL6 (old glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
11
1-1

Fedora 30 (new glibc):

$ ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.utf8 sort
1-1
11

I concur with Daniel's suggestion that maybe "C" locale is
the thing to use for this data.

			regards, tom lane





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux