Larry, Other than disabusing people of ideas that this is easy, I really do not want to talk about protocol details or solutions. I want to talk about the procedural issue of how we move forward, given that we have an interesting array of symptoms, e.g., * drafts have been initiated to address specific and important problems, or even to start identifying them in depth, and the IESG (fwiw, I believe correctly but others may not agree) has been unwilling to address them because there has not appeared to be sufficient in-depth expertise and energy/ commitment to form a working group. * the IAB put together an I18N Program effort to try to get on top of some of the issues. I would claim it did some useful work in its early days, but the IAB shut it down a year ago after concluding that it wasn't doing anything, had not done anything for some time, and was showing no signs of that changing (IAB, if you don't like that summary, suggest another one, but I think that is reasonably accurate). That decision was announced, rather than being discussed with the the community or even with the set of Program members who were not on the IAB, something the IAB clearly has the right to do. * As far as I know, we have no plan to get unstuck from the above. I note Nico's suggestion of > I mean, here's how you do this: > > a) you get a couple of ADs with I18N experience, > b) also someone on the IAB with I18N experience, > c) add a mandate for an I18N Considerations section, > d) add an I18N directorate And note that we've had ADs and IAB members with I18N experience and it hasn't appeared to make any long-term difference, that there have been efforts to explain the importance of these issues to several recent Nomcoms and either they could find no candidates with significant levels of the right skills or they didn't consider the issues very important. We've also had a mandate for what was then called "multilingual Considerations" for over 20 years (see Section 8.1(I) of RFC 2130). That IAB Program wasn't a directorate as we usually define the term, but it had a lot of the same properties. I'm not interested in casting blame, but comparing that list with things we have tried already certainly reinforces my sense that we have gotten stuck. Or, if you prefer, that we are at the bottom of a hole and all we can think to do is to stand there and do nothing or keep digging. Relative to your proposal I also note that we made decisions (also more or less referred to in RFC 2130 that actual protocol elements should stay in ASCII unless they was compelling reason to "internationalize" them. We've learned a lot since then and the world has changed, but, at least IMO, we have never really reexamined that principle. As to an "identify which names can cause problems" service, that has been tried too. A number of script-specific efforts, starting with the JET work reflected in RFC 3743 and extending forward to include at least ICANN's LGR effort, have focused on identifying characters (or Unicode code points) that might be problematic in various ways and combinations. Others have looked at potentially confusing relationships, both due to accidental confusion (hard problem for the general case) and malice (nearly impossible case, IMO). It just simply isn't as easy as you think, especially if the IETF does not abandon the principle that we will not go through Unicode one code point at a time looking for problems, possibly in comparison with every other code point. It may also be worth pointing out that one of the "stuck" (and now-expired) drafts was devoted to the identification of troublesome code points. This simply isn't as easy as whipping up a few programs ... and getting consensus that those programs do the right thing (even more or less) would be very difficult because counterexamples keep coming out of the proverbial woodwork or crawling out from under proverbial rocks. Personally, I think that "troublesome character" approach would be helpful as long as we can be very clear about its limitations, including remembering that it isn't the Final Ultimate Solution to anything or a replacement for skilled and knowledgeable human judgment, and we can figure out a reliable and sustainable way to maintain the table. We don't have an obvious way forward with either of those requirements and that is, again, a reason I thought this discussion was worth starting. best, john --On Friday, June 1, 2018 04:51 +0000 Larry Masinter <masinter@xxxxxxxxx> wrote: > A modest proposal (I'm sure this is controversial so flame > away...) > > A big part of the problems in i18n in IETF protocols have to > do with extending protocol elements from ASCII to Unicode, and > how to avoid difficulties when that happens. > > Protocol elements include domain names, URLs, email addresses, > file names > > But where do these Unicode names come from? They're not > arbitrarily generated by automated processes, they're > constructed from strings that are selected, typed in, > registered. So focus on encouraging people to choose strings > that won't give problems. > > A large specification of all of the use cases to avoid is very > difficult to write and hard to review. There are very many > special cases (final sigma, umlauts, private name characters, > non-normalization of combined forms) with expertise widely > distributed. I'm not sure the solution is "more specs"; in > fact, there are many obscure special cases, and the specs are > very difficult to write and review. > > I wonder if there's any interest in building an open-source > service that would, when given a proposed domain name or URL > or email address, tell you what problems various subsets of > users would have when trying to deploy that name (e.g., names > that don't display properly on popular platforms, names that > can't be reliably typed in correctly even if they can be > viewed, those that are likely to get confused with other > similar but different names). > > Perhaps get started at a Hackathon? > > I did reserve the domain name "caniuse.name" that I will offer > to any sincere effort. > > > >