-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [I don't read the address the private mail was sent to as often, but am still subscribed to classpath via a newer address. Hopefully this is okay to forward to the list.] According to Anthony Balkissoon on 1/31/2006 11:43 AM: > Hi Eric, > > I noticed you were the one that wrote the GNU Classpath scripts to read > the Unicode data and generate the java files needed. I want to update > these to Unicode 4.0.0 but am unsure what this entails. Could you > perhaps suggest what you think will need to be done to make the scripts > useable for 4.0.0, which assigns code points for supplementary > characters? I didn't write unicode_muncher.pl, just updated it from unicode 3.0 to 3.1. I think the bulk of the work will just be making sure that you enumerate for all characters, instead of stopping at 64k. I'm not sure if you will have to shuffle any bitfield sizes to fit the additional data, but I do recall that the script looped through several block sizes to choose the size that resulted in the smallest table. > > Also, was the encoding used (lower 5 bits for type of character, upper 9 > bits for offset into attribute tables) something you designed yourself > or is that something that came from the Unicode specs/docs? Inherited from the previous coder. In fact, I liked gcj's unicode-muncher.pl so much at the time that I rewrote Jikes' unicode parser (written in Java instead of Perl) to use the same packing concept. The differences are that Jikes needed less information (it was only making sure which characters were legal in identifires, instead of a full-blown implementation of java.lang.Character), and that by writing it in Java, then running against Sun's JDK, I picked up Sun's interpretation of Unicode 4.0, instead of parsing Blocks-4.0.0.txt myself; but ISTR that the results turned out the same. Before I stepped down as a Jikes maintainer, I managed to update the jikes' version of the script to build a packed table based on Unicode 4.0: http://cvs.sourceforge.net/viewcvs.py/jikes/jikes/src/gencode.java?rev=1.14&view=markup Hopefully that can give you some ideas of where to proceed with the perl script. > > Thanks for any help you can offer! > > --Tony > > - -- Life is short - so eat dessert first! Eric Blake ebb9@xxxxxxx -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD50yj84KuGfSFAYARAnKXAJ9N26VY4vkh+FdlOpfHFaTM7Y7kAQCgtPHk no7WAhhnkuLRKVHDEdf+zPo= =fguU -----END PGP SIGNATURE-----