Search Postgresql Archives

Text search parser's treatment of URLs and emails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I noticed that if I run this:

SELECT alias, description, token FROM
ts_debug('http://www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary');

I get:

  alias   |  description  |                              token
----------+---------------+-----------------------------------------------------------------
 protocol | Protocol head | http://
 url      | URL           |
www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary
 host     | Host          | www.postgresql.org:2345
 url_path | URL path      |
/directory/page.html?version=9.1&build=alpha1#summary
(4 rows)


It could be me being picky, but I don't regard parameters or page
fragments as part of the URL path.  Ideally, I'd sort of expect:

    alias     |  description  |                              token
--------------+---------------+-----------------------------------------------------------------
 protocol     | Protocol head | http://
 url          | URL           |
www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary
 host         | Host          | www.postgresql.org
 port         | Port          | 2345
 url_path     | URL path      | /directory/page.html
 query_string | Query string  | version=9.1&build=alpha1
 fragment     | Page fragment | summary
(7 rows)

... of course that's if there was support for query strings and page
fragments, which there isn't.  But if changes were made to support my
definition of a URL path, they'd have to be considered breaking
changes.

But my main gripe is with the name "url_path".

Also:

SELECT alias, description, token FROM ts_debug('myname+priority@xxxxxxxxx');

Yields:

   alias   |   description   |       token
-----------+-----------------+--------------------
 asciiword | Word, all ASCII | myname
 blank     | Space symbols   | +
 email     | Email address   | priority@xxxxxxxxx
(3 rows)

The entire string I entered is a valid email address, and isn't
totally uncommon.  Shouldn't that take such email address styles be
taken into account?  The example above incorrectly identifies the
email address since the real destination address would most likely be
myname@xxxxxxxxxx

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux