Re: Comments on draft-dusseault-caldav-15 and draft-newman-i18n-comparator-14

John C Klensin <john-ietf@xxxxxxx> · Tue, 26 Sep 2006 08:57:57 -0400

--On Monday, 25 September, 2006 11:07 -0700 Lisa Dusseault
<lisa@xxxxxxxxxxxxxxxxx> wrote:

> On Sep 23, 2006, at 2:20 AM, Julian Reschke wrote:

>> But as a matter of fact, draft-newman-i18n-comparator-14
>> doesn't   define any collations that would actually solve the
>> Unicode NF   issue, so it's not really clear how this helps
>> CalDAV (except that   it now uses a framework in which the
>> solution may become available   in the future).

Please watch for the final version of draft-iab-idn-nextsteps
(probably to be posted as RFC 4690 within the next few days) and
for draft-???-idnabis-issues-00 (soon).  Neither "solves" the NF
problem, but they may help make it more clear why the NF problem
is not solvable in any general case.  It can be solved for
particular languages or, more specifically, particular
orthographies of particular languages.  But, as long as we are
operating at the "Unicode" level, without specific
language-identifying information transmitted in-band every time
we transmit a string, there is no general solution.

Fortunately, I don't believe that issue is in the critical path
of the base comparator document.

>> Maybe the set of initial registrations in
>> <http://tools.ietf.org/ 
>> html/draft-newman-i18n-comparator-14#section-9> needs to be
>> extended?
> 
> Yes, I agree. That's one of the next steps and why a registry
> was  created (so we could do it outside the base comparator
> draft).
> 
> Last week Ted & I were discussing whether one could define a
> Very  Liberal Comparator (VLC) for general use.  It would be
> handy to have  one which matched e with E, é, è É... and
> matched o with O, ø, ô, and  so on.  That would help in
> calendar searching use cases, e.g. a user  who can't type in
> accents (or doesn't know how) wants to find the  invitation
> from André by searching for "andre".  It would probably be
> useful in many other cross-language or unknown-language
> situations too.

Arggh.  The difficulty here is that, for some scripts and
languages a "decorated" version of a base characters can be, by
convention or the natural properties of the language, replaced
by an undecorated version.  For others, the decorations actually
form different characters, with different phonetics, names, and
other properties (Unicode character names do not consistently
reflect this distinction, partially because it is impossible to
do to.  To pull two examples out of idn-nextsteps, a very
liberal comparator should let "ö" and "ø" match for some
well-known Scandinavian languages and neither should match "o".
But, in German, "ö" should generally match "oe", but not
vice-versa.  Perhaps it should match "o" as well, but that would
be controversial.

This set of problems actually gets worse as on moves outside
Roman-derived scripts, even though the Roman-derived scripts
probably have the richest collection of characters whose glyph
forms are decorated versions of other characters.

So, by all means do this if you think it is useful -- and I
agree that it might be-- but please give it a value-neutral
name, not, e.g., "very liberal".

Again, not in the critical path for the comparator document, IMO.

> Such a comparator would be most useful for exact and substring
> matches; I don't know offhand how it would best do ordering so
> it  might not be as useful for ordering.

Ordering is even more tightly tied to the "different character"
versus "decorated version of existing character where the
decorations are semi-optional" distinction.  I suggest that
trying to do it in a general way will lead either to frustration
or to serious errors.

> I believe Arnt intends to continue working on this general
> problem,  for which I'm very grateful, and other contributions
> would be most  welcome.

I very much appreciate his efforts in the area, wish him luck,
and  hope that the community will be tolerant of efforts that
meet specific needs and are clearly identified with those needs.

    john

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf