Search Postgresql Archives

Re: to_tsvector in 8.2.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



postgres=# select to_tsvector('test text');
  to_tsvector
---------------
 'test text':1
(1 row)
Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and > 0x7f is alpha.
To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows.



--
Teodor Sigaev                                   E-mail: teodor@xxxxxxxxx
                                                   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/wordparser/parser.c.orig	Wed Mar 21 20:41:23 2007
--- ./contrib/tsearch2/wordparser/parser.c	Wed Mar 21 21:10:39 2007
***************
*** 124,130 ****
--- 124,134 ----
  			 * with C-locale is an alpha character
  			 */
  			if ( c > 0x7f )
+ 			{
+ 				if ( c == 0xa0 )
+ 					return 0;
  				return 1;
+ 			}
  
  			return isalnum(0xff & c);
  		}
***************
*** 157,163 ****
--- 161,171 ----
  			 * with C-locale is an alpha character
  			 */
  			if ( c > 0x7f )
+ 			{
+ 				if ( c == 0xa0 )
+ 					return 0;
  				return 1;
+ 			}
  
  			return isalpha(0xff & c);
  		}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux