On Jul 2, 2006, at 6:13 AM, Martijn van Oosterhout wrote:
But I don't think anyone is actually considering importing ICU into the
postgres source tree, are they?
Why not?
Size - I'm not sure this is relevent since I don't think we want to
incorporate it into postgres itself, just let people use it if they
have it. In any case though, the default dataset is 8MB. This includes
support for every locale and charset it knows about.
If you drop the conversion stuff (because postgres already has that)
you're down to about 4MB.
Why would you drop the ICU transcoding support instead of the existing
postgres functions? Why the duplicated effort?
Well, the Japanese think that UTF8 is not the solution to all their
worries, so they won't be happy with a UTF8-only solution. Likewise,
those of us who only need single-byte character sets won't be very
happy
with being forced to accept multi-byte processing overhead.
I've not quite understood the japenese problem with Unicode. My
understanding is that it was primarily due to widespread use of broken
converters.
Certain Japanese characters cannot make a reliable round-trip through
Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be
happy with an ICU-only solution. However, it would still be of great
benefit to allow ICU to handle as much as possible, leaving the string
encodings to the encoding experts.
At the very least, it would be great to have ICU to handle encoding on
a per-column basis (perhaps extending the text datatype with encoding
info). Perhaps this would be a decent stopgap solution? The backend
protocol would also need a version bump- currently, it converts all
strings to a single encoding.
¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
AgentM
agentm@xxxxxxxxxxxxxxxxxxxxx
¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬