Thanks for the response. You are correct in that the difference between al32utf8 and utf8 is in better support for supplementary characters with al32utf8.
If supplementary characters are inserted in a UTF8 database, they will be treated as 2 separate undefined characters, occupying 6 bytes in storage. Oracle recommends using al32utf8 for any newly defined supplementary characters.
Does PostgreSQL make a distinction within Unicode in a similar fashion? We have not tested our Oracle al32utf8 databases on PostgreSQL, but while creating databases in PostgreSQL, we see UTF8 as an option, but not al32.
Thanks,
Mridul.
From: Rajeshwar Bharathi [mailto:rajeshwarbharathi@xxxxxxxxx]
Sent: Wednesday, August 10, 2011 1:14 PM
To: Mridul Mathew
Subject: Fwd: Character set equivalent for AL32UTF8
---------- Forwarded message ----------
From: Craig Ringer <ringerc@xxxxxxxxxxxxx>
Date: Wed, Aug 10, 2011 at 11:49 AM
Subject: Re: Character set equivalent for AL32UTF8
To: pgsql.admin@xxxxxxxxxxxxxxxx
Cc: RBharathi <rajeshwarbharathi@xxxxxxxxx>, pgsql-admin@xxxxxxxxxxxxxx
On 2/08/2011 8:52 PM, RBharathi wrote:Hi,
We plan to migrate data from Oracle 11g with characterset AL32UTF8 to a Postgres db.
What is the euivalent charecterset to use in Postgress. We see only the UTF-8 option.
What's AL32UTF8 ? That's not a standard charset name or widely recognised charset. Is it some Oracle specific feature? If so, what makes it different to UTF-8 and why do you need it?
Documentation link? References?
A 30-second Google search turned up this:
http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/
"As far as these two character sets go in Oracle, the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set."
Is this what you're taking about? If so, what's the concern? Have you checked to see if PostgreSQL's behavior fits your needs?
--
Craig Ringer
--
Rajeshwar BM
Bangalore INDIA
Fiberlink Disclaimer: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.