Search Postgresql Archives

Re: regex match and special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/16/2018 03:59 AM, Alex Kliukin wrote:
Hi,

Here is a simple SQL statement that gives different results on PostgreSQL 9.6 and PostgreSQL 10+. The space character at the end of the string is actually U+2006 SIX-PER-EM SPACE (http://www.fileformat.info/info/unicode/char/2006/index.htm)

test=# select 'abcd ' ~ 'abcd\s';
  ?column?
----------
  t
(1 row)

test=# select version();
                                              version
-------------------------------------------------------------------------------------------------
  PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit
(1 row)


On another server (running on the same system on a different port)

postgres=# select version();
                                             version
-----------------------------------------------------------------------------------------------
  PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit
(1 row)

postgres=# select 'abcd ' ~ 'abcd\s';
  ?column?
----------
  f
(1 row)

For both clusters, the client encoding is UTF8, the database encoding and collation is UTF8 and en_US.utf8 respectively, and the lc_ctype is en_US.utf8. I am accessing the databases running locally by ssh-ing first to the host.

I observed similar issues with other Linux-based servers running Ubuntu, in all cases the regex resulted in true on PostgreSQL 10+ and false on earlier versions (down to 9.3). The query comes from a table check that suddenly stopped accepting rows valid in the older version during the migration. Making it  select 'abcd ' ~ E'abcd\\s' doesn't  modify the outcome, unsurprisingly.

Is it reproducible for others here as well? Given that it is, Is there a way to make both versions behave the same?

select version();
version
------------------------------------------------------------------------------------
PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit


lc_collate | en_US.UTF-8
lc_ctype                            | en_US.UTF-8


test=# select 'abcd'||chr(2006) ~ E'abcd\s';
 ?column?
----------
 f
(1 row)

In your example you are working on Postgres devel. Have you tried it on Postgres 10 and/or 11?


Cheers,
Alex




--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux