Re: Hostnames, IDNs, Punycode and Unicode Case Folding

David G Johnston <david.g.johnston@xxxxxxxxx> · Mon, 29 Dec 2014 17:01:44 -0700 (MST)

On Mon, Dec 29, 2014 at 4:51 PM, Mike Cardwell [via PostgreSQL] <[hidden email]> wrote:

	* on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote:

>>>    WHERE hostname='nißan.com'

>>>

>> 

>> _IF_ Postgres had a punycode function, then you could use:

>> WHERE punycode(hostname) = punycode('nißan.com')

> 

> If the OP wraps what he is doing up into a function that is what you end up

> getting: a memoized punycode function.

> 

> http://en.wikipedia.org/wiki/Memoization
> 

> It has to be defined as volatile but basically write the function to check

> for the provided input on the indexed table and if it doesn't exist the

> function will calculate the punycode value and store it onto the table

> before returning the punycode value to the caller.
I'm not sure all that is necessary. It could be quite a simple function,
like the lower() function. So what I would do is this:

CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames (lower(punycode_encode(hostname)));

That would prevent adding more than one representation for the same hostname

to the column.

Except two different hostname can resolve to the same punycode_encode(hostname) value so the unique index won't work.
It was also mentioned that using the Perl encoding function was non-performant; which is why caching the data into a memoization table has value.

WHERE lower(punycode_encode(hostname)) = lower(punycode_encode('any-representation'))

I'm not for knowing the rules of punycode but I'm not seeing what value lower() provides here...

There doesn't need to be any extra table storage for the punycode encoded

version.

David J.

View this message in context: Re: Hostnames, IDNs, Punycode and Unicode Case Folding

Sent from the PostgreSQL - general mailing list archive at Nabble.com.