>The implication of this is that the entire registry would need to be
>fetched, right? Given I'm told the registry under discussion is a
>bit on the large-ish size, this might be somewhat problematic. Can
>you give me an idea of the anticipated scale of the number of
>applications that will be wanting to fetch this registry?
David,
let think in terms of network for once (IETF is about networking).
You have currently 500 millions users. I think a conservative figure
for the ten years to come we discuss, is a Multilingual Internet is 2
billions CPU. These CPU must be able to check if they are up to date,
on a regular basis. Best practice would be at connect time and once every day.
The IETF registry adds complexity (extlangs, comments, changes) on
top of the ISO 639 series. The ISO 639-3 7,500 items table will not
be printed because it will be constantly maintained (please, Peter,
can you comment). This will be the same (and probably more ) for the
ISO 639-6 tables which may range in between 15 and 30.000 items
(please, Debbie, can you comment). Peter commented that one or
several updates a week is most probable.
This means an updating scheme comparable to a DNS root system with
40.000 TLDs. You probably have the figures of the rs.internic.org to
compare with, plus the access to the RSSAC root servers. Except that
the root file is 65 K and we talk of a file probably 100 to 10,000
larger. And that the root file is supported by the DNS protocol (the
root servers are only occasionally called upon by users - in case of
a TLD typo or when their ISP nameserver has not yet/no more
information on a called TLD).
Fetching the file would be like carrying an axfr of the root or an
FTP access to rs.internic.org. A DNS like protocol I proposed in vain
could permit to decrease, organise, and distribute that load.
Langtags are probably like TLDs: some will never be queried in some
geographic areas. Many will have a very long TTL (reasonably
equivalent to root real life TTL). But many reasons may call for a
shorter system TTL.
Obviously caching the registry at the user end would help a lot (each
user would probably need a limited number of languages to be
documented [those in his filters and those in his relational area].
The others representing pollution to him.
However, this must be considered in an interoperable context with
other language codes. RFC 4646 does not care about interoperability,
but "x-tags" are enough constrained to permit to build a partial
strategy based upon 8 alphanum tags + signature.
These private tags will need to be verified and validated the same.
Either you support them and queries go to you and your mirrors. Or
you don't and necessarily queries will go to private resolvers (much
like for the DNS root, so an equivalent top system load). I initially
explained that langtags had to document referents to be of interest
to computers and extended services (to the content). Now, a simple
format for private langtags can be x-8 alphas (three for languages, 2
for script, 2 for country, one by referent in that context - 36
possibilities is large enough until RFC 4646ter which will adapt) - a
langtag private library signature. The size of the central registry
will be quite large with more information. But the checking traffic
can be limited to a warning on changes through the distribution of a
regex of the 8 alpha change and a compacted date. This means 12
alphanum per change announcement. This may mean a 100 to 2000 chars
message a day at login time, obtained by the ISP from the IANA. This
information could even be added at the root footer, since this
information is already downloaded by ISPs.
We considered these avenues and others for the MDRS project, and
tested them. They call for crosswalks with different standards/codes
outside of the Internet (languages are not restricted to the Internet).
I hope this gives you some elements.
jfc
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf