update gnu classpath to Unicode 4.0.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[I don't read the address the private mail was sent to as often, but am
still subscribed to classpath via a newer address.  Hopefully this is okay
to forward to the list.]

According to Anthony Balkissoon on 1/31/2006 11:43 AM:
> Hi Eric,
> 
> I noticed you were the one that wrote the GNU Classpath scripts to read
> the Unicode data and generate the java files needed.  I want to update
> these to Unicode 4.0.0 but am unsure what this entails.  Could you
> perhaps suggest what you think will need to be done to make the scripts
> useable for 4.0.0, which assigns code points for supplementary
> characters?

I didn't write unicode_muncher.pl, just updated it from unicode 3.0 to
3.1.  I think the bulk of the work will just be making sure that you
enumerate for all characters, instead of stopping at 64k.  I'm not sure if
you will have to shuffle any bitfield sizes to fit the additional data,
but I do recall that the script looped through several block sizes to
choose the size that resulted in the smallest table.

> 
> Also, was the encoding used (lower 5 bits for type of character, upper 9
> bits for offset into attribute tables) something you designed yourself
> or is that something that came from the Unicode specs/docs?

Inherited from the previous coder.  In fact, I liked gcj's
unicode-muncher.pl so much at the time that I rewrote Jikes' unicode
parser (written in Java instead of Perl) to use the same packing concept.
 The differences are that Jikes needed less information (it was only
making sure which characters were legal in identifires, instead of a
full-blown implementation of java.lang.Character), and that by writing it
in Java, then running against Sun's JDK, I picked up Sun's interpretation
of Unicode 4.0, instead of parsing Blocks-4.0.0.txt myself; but ISTR that
the results turned out the same.  Before I stepped down as a Jikes
maintainer, I managed to update the jikes' version of the script to build
a packed table based on Unicode 4.0:
http://cvs.sourceforge.net/viewcvs.py/jikes/jikes/src/gencode.java?rev=1.14&view=markup

Hopefully that can give you some ideas of where to proceed with the perl
script.

> 
> Thanks for any help you can offer!
> 
> --Tony
> 
> 

- --
Life is short - so eat dessert first!

Eric Blake             ebb9@xxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD50yj84KuGfSFAYARAnKXAJ9N26VY4vkh+FdlOpfHFaTM7Y7kAQCgtPHk
no7WAhhnkuLRKVHDEdf+zPo=
=fguU
-----END PGP SIGNATURE-----


[Index of Archives]     [Linux Kernel]     [Linux Cryptography]     [Fedora]     [Fedora Directory]     [Red Hat Development]

  Powered by Linux