On 7/2/06, Agent M <agentm@xxxxxxxxxxxxxxxxxxxxx> wrote:
Certain Japanese characters cannot make a reliable round-trip through Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be happy with an ICU-only solution. However, it would still be of great
Could you explain what you mean and what's special with those characters?
benefit to allow ICU to handle as much as possible, leaving the string encodings to the encoding experts. At the very least, it would be great to have ICU to handle encoding on a per-column basis (perhaps extending the text datatype with encoding info). Perhaps this would be a decent stopgap solution? The backend protocol would also need a version bump- currently, it converts all strings to a single encoding.
Could you give an example of what that would look like in your opinion? I was thinking more along the lines of a setting in pg_hba.conf where the server uses or does not use something like ICU...at least as an intermediate solution. Adding a "LOCALE" clause to a column definition (similar to the "ENCODING" clause of the "CREATE DATABASE" statement) would solve most (not all) problems with a default locale. There still might be some non-deterministic behaviour with operations between strings in different locales but it's far from a showstopper. t.n.a.