On 26 Feb 2021, at 3:50, Ricardo Signes wrote: > It's true that the definition of emoji sequence is not (yet?) entirely stable. In the context of message content, I'm not sure what teeth the dragon is presenting here. I agree that I would not want to use this reference to TR51 in specifying many kinds of things, but in this context, I don't see the problem. As I introduced the wording of "dragon" and "teeth" while being relaxed over what Adam suggested, here is the explanation. If the IETF have a formal definition of what is allowed and not allowed, that might imply people for various reasons, regardless of the Postel Principle, take for granted there really is one set of unicode characters in a specific sequence which are allowed, and some that are not. Specifically as the Unicode spec for emoji sequence is VERY detailed talking about what emoji characters can be joined and not, and in what way. This to make the rendering easier, and (what they claim) more predictable and safer. See the SSAC document that I named and Dave linked to (thanks) for examples of such combination of characters and the dangers when comparing them (ordering of characters for example). This specification in the Unicode spec is as John has explained not stable between unicode versions, and in a very different format than the grammar formats IETF is used to. Because of that, to force people that read the RFC to also understand, parse and correctly implement the Unicode specification is I claim something that just will not happen. The unicode spec will be interpreted correctly by unicode people. Maybe. It is VERY complicated. Have you tried to implement it yourself? I am just asking, because you might be one of the persons that do understand it. I do not understand all details and can not say what is in there or not given a random sequence of unicode characters. And I normally read Unicode specifications for breakfast since a number of years back. This is also why I am so explicit saying that it is the _reference_ that is the dangerous thing here. And why I think what Adam wrote is ok, (at least what I read) that the value of the attribute we pass around could be, according to IETF standard, "a sequence of unicode characters" (see John's comment on why you should not talk about octets). This because IETF already (as Adam writes between the lines) know what details and odds and dangers there is with text. Control characters, line breaks and what not. So even though it might at first look at MORE dangerous to allow the field to be a series of unicode characters, it is from my perspective MUCH MORE safe than trying to get IETF people understand the Unicode Specification of Emoji Sequences. Thats where from my perspective there is a dragon there, and why the teeth are sharp. Patrik
Attachment:
signature.asc
Description: OpenPGP digital signature
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call