Re: [Last-Call] [art] Artart last call review of draft-ietf-jmap-contacts-06

John C Klensin <john-ietf@xxxxxxx> · Tue, 02 Apr 2024 23:12:43 -0400

--On Tuesday, April 2, 2024 18:21 -0700 Rob Sayre
<sayrer@xxxxxxxxx> wrote:

> On Tue, Apr 2, 2024 at 5:44 PM John C Klensin
> <john-ietf@xxxxxxx> wrote:
> 
>> It seems to me that your conclusion depends on an assumption
>> about "the JMAP community" that might be questionable.  An
>> analysis of the JMAP specification leads me to a slightly
>> different conclusion; inline below.

> At the risk of repeating myself, I don't think PRECIS really
> does the trick here, even though it's easy to see it does work
> sometimes (I went and looked at the "Referenced By" stuff in
> the datatracker). The document is "Preparation, Enforcement,
> and Comparison of Internationalized Strings Representing
> Usernames and Passwords".

Rob,  I didn't mean to say "require PRECIS", only that we are
under some obligation to do better than saying "UTF-8" and
giving length boundaries.  If PRECIS is a good answer, fine.  If
not, that is fine too, as long as we are clear about the
alternative.

Beyond that as far as PRECIS is concerned, I'm not sure what you
are talking about because both RFC 7564 and 8264 are
"Preparation, Enforcement, and Comparison of Internationalized
Strings in Application Protocols".   RFC 8265 is about
"...Representing Usernames and Passwords", but that is
appreciably more restrictive.

> But there are other problems that will come up, like writing
> street addresses in Japanese, Thai, etc., as written here:
> 
> https://datatracker.ietf.org/doc/draft-ietf-calext-jscontact/
> 
> So, while PRECIS might apply cleanly to this draft, I think
> the JSON implementations will be facing much more varied
> content.

Ok.  But this is a Last Call on this document.  It isn't even
about "JSON implementations", it is about a very specific
application of JMAP.

> That's why I prefer the approach here:
> 
> https://datatracker.ietf.org/doc/draft-bray-unichars/
> 
> It's also realistic about the fact you will get the so-called
> "toxic waste" from whatever JavaScript's JSON.parse and
> JSON.stringify do. This way is even better as it relates to
> "any UTF-8 string", because the Unichars draft covers escape
> sequences. The issue in JSON or XML is that you can send
> perfectly valid UTF-8, but there might be escape sequences
> that represent total garbage from a Unicode perspective. The
> Unichars draft provides some usefully adversarial examples.

Understood.  I'm reluctantly going to respond briefly on this
thread but then think the discussion should go elsewhere since
we are not, AFAICT, even close to a Last Call on, e.g.,
draft-bray-unichars.

I suggest that there is something of a spectrum with "just use
valid UTF-8" at one extreme and very restrictive,
application-specific and application-tuned, models like IDNA2008
at the other.  In between lie specifications that try to avoid
the worst problems one can get into with Unicode, including, in
no particular order, draft-bray-unichars,
draft-bormann-dispatch-modern-network-unicode, the (IMO) badly
outdated RFC 5198, and UTR#36 and/or UTS#39.   The PRECIS
framework document (RFC 8264), which is what I've had in mind
when I (and others) have said "just use PRECIS" is somewhat
further out on that spectrum although perhaps not much further
than UTS#39 and UTR#36 with a good choice of options.  And
PRECIS in the specific "Usernames and Passwords" (RFC 8265) and
"Nicknames" (RFC 8266) variations are someone further out, but
still not as far as IDNA2008.  Now, if we can agree that "just
use a UTF-8 string" is not good enough (whether STD 63/ RFC 3269
are explicitly referenced or not), the question is how far out
on that spectrum one should go in providing restrictions and/or
advice ... with the understanding that, while
draft-bray-unichars and
draft-bormann-dispatch-modern-network-unicode eliminate
considerable "toxic waste" and other problem-prone
constructions, going further with trash and risk reduction may
be appropriate, at least as advice, in many situations.  Those
I-Ds each specific subsets of the possible collection of
strings.  The more restrictive specs specify what should be
subset of what those two allow (if they aren't proper subsets, I
think some careful review is in order).  

But, again, as far as this particular draft is concerned, the
issue as I see it is moving back "UTF-8 string" with some length
limits.  Something like draft-bray-unichars would be a step in
the right direction with the question of whether it goes far
enough.  But, independent of that answer, it suggests that the
current I-D in IETF LC needs some work.

best,
   john

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call