Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness

Mario Torre <neugens@xxxxxxxxxxxxxxxx> · Wed, 06 Dec 2006 14:21:17 +0100

Il giorno mer, 06/12/2006 alle 13.50 +0100, Mark Wielaard ha scritto:
> Hi Mario,

Hi Mark! Thank you for the quick reply!

> But in the old code the decomposing is done, although not really in the
> classpath case, only in the case of libgcj. Could you explain the
> difference between classpath/libgcj here and how this actually helps us?
> Can't we use the libgcj version?

Yes, in the case of classpath it is not done at all. In case of gcj it
is broken though.

I'm not sure about how much broken it is. It maybe just a matter of
wrong data (gcj use the UCD 3.0, jdk 1.5 uses the UCD 4.0.0). This is
not that different, just few addictions, so I guess this is only part of
the problem.

The method used by gcj is not complete, this is sure.

The javadoc says that the following rules are defined:

* NO_DECOMPOSITION

jdk: accented characters will not be decomposed for collation.
gcj: no decomposition is performed at all.

* CANONICAL_DECOMPOSITION

jdk: characters that are canonical variants according to Unicode
standard will be decomposed for collation. Used for accented character.
gcj: read from canonical_decomposition array the values and use this
array to calculate decomposition.

* FULL_DECOMPOSITION

jdk: Unicode canonical variants and Unicode compatibility variants will
be decomposed for collation.
gcj: does the same as before, using a different array.

The last method should be the "Compatibility decomposition" named in the
Unicode Standard, if I'm not wrong.

What is clear to me is that we are doing the wrong thing here, as this
class and these methods are more complex than what we have (and I fear
another DecimalFormat...).

> It doesn't really add or remove functionality it seems. How is the user
> better of with this version than they were with the old one?

Actually yes, it is just to say that we have 1.3 complete... it is of no
use at all as is.

> If it helps you structure the code in a way that makes improving it better please do
> go for it.

This is the reason. It makes sense to have all this functionality in one
place, as it is related to just this class. Unless, of course, reading
better the code and understanding it I find that even Collator and
RuleBasedCollator are wrong (I have no reason to think that now, but I
also know that this area is a bit in darkness, the javadoc does not
help, and there are no effective tests in mauve).

> But if the functionality doesn't really change I am not sure

True, and the drawback is to fool users into thinking that we have
implemented this functionality. I think I'll do as in DecimalFormat, I
will keep a local branch until all the functionality are in place and
then submit them for review.

The Unicode standard is well documented, I "only" have to find how it is
implemented in the jdk.

> Mark

Ciao,
Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/
Attachment:
signature.asc

Description: Questa =?ISO-8859-1?Q?=E8?= una parte del messaggio	firmata digitalmente