--On Monday, April 8, 2024 14:06 +1000 Bron Gondwana <brong@xxxxxxxxxxxxxxxx> wrote: > On Mon, Apr 8, 2024, at 01:09, John C Klensin wrote: >> Having just reread the charter, I think the latter situation -- >> including the need to revise the charter and downgrade specs-- if >> not appropriate, or even tenable. But, if that is the case, then, >> while it would be reasonable for a spec to explicitly discuss, >> e.g., what JMAP implementers are doing, an IETF Standards Track >> spec should still be specifying The Right Thing to Do rather than >> ignoring that and specifying current implementation practice >> alone. >> >> This is not just about formalities. As has been pointed out >> elsewhere, unrestricted use of arbitrary UTF-8 strings will last >> in a particular context up to the point that some problem occurs >> that receives wide publicity and perhaps ridicule for the >> implementers. It would be far more appropriate (and desirable) for >> the IETF to specify what should be done when (and if) people get >> the message than to say nothing and be among the ridiculed. If >> we know the potential problems, let's not put those implementers >> in the position of being able to say "no one ever told us about >> that". > > There's a related issue here which is that jmap-contacts is largely > transporting jscontact, which is a format from the CALEXT working > group, where there's a requirement to be cross compatible with > VCARD (RFC 6350), an existing IETF format which has this to say > about UTF-8: > > 3.1 <https://datatracker.ietf.org/doc/html/rfc6350#section-3.1>. > Charset > > The charset (see [RFC3536 > <https://datatracker.ietf.org/doc/html/rfc3536>] for > internationalization terminology) for vCard is UTF-8 as defined > in [RFC3629 <https://datatracker.ietf.org/doc/html/rfc3629>]. > There is no way to override this. It is invalid to specify a > value other than "UTF-8" in the "charset" MIME parameter (see > Section 10.1 > <https://datatracker.ietf.org/doc/html/rfc6350#section-10.1>). > > NON-ASCII = UTF8-2 / UTF8-3 / UTF8-4 > ; UTF8-{2,3,4} are defined in [RFC3629 > <https://datatracker.ietf.org/doc/html/rfc3629>] > > VALUE-CHAR = WSP / VCHAR / NON-ASCII > ; Any textual character > > In order to round-trip values between VCARD and jscontact, it is > necessary to be able to represent values found in arbitrary > real-world VCARDs. > > ... > > Meanwhile, there is the issue of the fields which are specific to > jmap-contacts and other future JMAP specs. On the one hand, we > could specify a restricted character set range for these free-text > fields (being aware that servers may also have their own stricter > restrictions on legal values for various reasons); however this > would create a distinction between free text fields in these specs > and free-text fields in other already published JMAP specs, > increasing the complexity for server implementations, who would > need different validators. > > Also, it's unlikely that clients will all validate their input > values - and clients will also need to handle misbehaving servers > sending them invalid content; so I don't see the real world benefit > to imposing this restriction given that it won't apply to all JMAP > objects, or even all of jmap contacts' fields (given the need to > remain compatible with RFC 6350 data). > > > I suggest saying something like this in the security considerations: > > Servers SHOULD reject any attempt to set names using characters not > in the Freeform Class as defined in PRECIS. > > and > > Clients SHOULD strip any characters not in the Freeform Class as > defined in PRECIS before displaying strings. Bron, Thanks. Especially given the hole we have gotten ourselves into (and which you described better than I could and with cases I didn't know about), I think your suggested solution is fine. My concern was only that we not either encourage unrestricted UTF-8 or or say "yep, might be a problem but not our problem". I could even live with something that builds on the above with a bit of explanation, e.g., prefacing your statements with something like "Experience has shown that unrestricted use of Unicode in UTF-8 form can lead to problems with consistent rendering, users reading text and interpreting it differently than intended, and, if text is copied from one location and pasted to another, unexpected results. Therefore..." That would provide some rationale for the SHOULD requirements, answer any "why not MUST" questions should they arise, and open the door to "this is ok because we know enough to do something safe" decisions (no one I know of has claimed that PRECIS is perfect). Whether such a mini-explanation would make thing better or worse is, AFAIAC, up to you and the WG. I promise to not whine about either. Should you decide to go with the explanatory text but think references are required, I (and probably Peter and several others) could come up with them fairly quickly. thanks, john If you do want to go down that path -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call