Thom Brown wrote: > Hi, > > I noticed that if I run this: > > SELECT alias, description, token FROM > ts_debug('http://www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary'); > > I get: > > alias | description | token > ----------+---------------+----------------------------------------------------------------- > protocol | Protocol head | http:// > url | URL | > www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary > host | Host | www.postgresql.org:2345 > url_path | URL path | > /directory/page.html?version=9.1&build=alpha1#summary > (4 rows) > > > It could be me being picky, but I don't regard parameters or page > fragments as part of the URL path. Ideally, I'd sort of expect: > > alias | description | token > --------------+---------------+----------------------------------------------------------------- > protocol | Protocol head | http:// > url | URL | > www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary > host | Host | www.postgresql.org > port | Port | 2345 > url_path | URL path | /directory/page.html > query_string | Query string | version=9.1&build=alpha1 > fragment | Page fragment | summary > (7 rows) > > ... of course that's if there was support for query strings and page > fragments, which there isn't. But if changes were made to support my > definition of a URL path, they'd have to be considered breaking > changes. > Wow, that is a tough one. One the one hand, it seems nice to be able to split stuff out more, but on the other hand we would be making url_path less useful because people would need to piece things together to get the old behavior. In fact to piece things together we would need to add '?' and '#' optionally, which seems kind of hard. Perhaps we should keep url_path unchanged and add file_path that has your suggestion. That would allow more fine-grained control without breaking backward compatibility. We already duplicate some data with url and url_path, so having file_path as another place we duplicate some seems OK. > But my main gripe is with the name "url_path". > > Also: > > SELECT alias, description, token FROM ts_debug('myname+priority@xxxxxxxxx'); > > Yields: > > alias | description | token > -----------+-----------------+-------------------- > asciiword | Word, all ASCII | myname > blank | Space symbols | + > email | Email address | priority@xxxxxxxxx > (3 rows) > > The entire string I entered is a valid email address, and isn't > totally uncommon. Shouldn't that take such email address styles be > taken into account? The example above incorrectly identifies the > email address since the real destination address would most likely be > myname@xxxxxxxxxx I had no idea '+' could be part of an email address, and in fact it is a modifier that is stripped off when delivering the email: http://my.brandeis.edu/bboard/q-and-a-fetch-msg?msg_id=0000Nu I didn't even know that was possible. It is used as an email delivery flag. I agree that needs to be corrected. We fixed URLs in 9.0 with: Use more standards-compliant rules for parsing URL tokens (Tom Lane) so I think it is reasonable to fix email addresses in 9.1. Care to submit a patch? You can lookup Tom's change as a guide. -- Bruce Momjian <bruce@xxxxxxxxxx> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general