Re: [Last-Call] Last Call: <draft-faltstrom-base45-09.txt> (The Base45 Data Encoding) to Informational RFC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




--On Thursday, February 3, 2022 23:23 +0100 Carsten Bormann
<cabo@xxxxxxx> wrote:

> On 2022-02-03, at 23:12, Viktor Dukhovni
> <ietf-dane@xxxxxxxxxxxx> wrote:
>...
>> Why is it base-45 and not base-41?  With (41**3) > 65535,
>> just 41 code points would suffice.  What is the rationale for
>> using additional code points?
> 
> I made this point in
> <https://mailarchive.ietf.org/arch/msg/cbor/gnX1E6qp0NttNjbuhe
> phcG6BnSQ> and earlier in private mail to one of the authors.
> 
> What I got back was apparently telling me that QR-Codes do
> offer 45 different characters, so hence base-45. This is, of
> course, equivalent to saying that ASCII offers 95 graphic
> characters, so with that argument base-64 should have been
> base-95 all along.

That ASCII example is interesting because the Base-64 design,
IIR, carefully considered and eliminated characters that were
not in ISO 646-BV and then settled on upper and lower case
letters and the digits and then, to get to 64, added "+" and "/"
(and "=" for padding).  Almost any of the other 30 graphics
would have raised issues.

> I think we can all agree that the base-45 specification does
> not describe the best possible design.  However, the approach
> is now widely used, e.g. in the European Digital Health
> Certificate, so the window to get this fixed has closed.
> Let's get this published.

Yes to the latter.   For "best possible" see below.

> Instead of trying to fix this document, another specification
> could be written that defines a base-41 (or base-40 +
> overflow) variant that could present fewer of the
> interoperability concerns base-45 does.

Well... One can probably make a case for almost any encoding of
this general type.  There are also arguments for compactness,
especially compactness of the encoding for characters that are
likely to appear together when more of Unicode it used -- that
is the reason that drove the Punycode encoding for IDNA.   I
suggest one can determine "best" only by stating a set of
criteria (or selecting an application that will generate them)
and that different, equally reasonable, criteria will yield
different "best"s.  There is, however, one other issue: every
new ASCII-compatible encoding we introduce is a new problem for
universal interoperability, a problem that gets more serious if
one cannot easily tell them apart by simple examination of the
strings.  So, my response to a base-41 suggestion would be,
"well, sure, but do we really need another one of these", or,
put a bit differently, "if we count the interoperability issues,
the possibility of incorrect decoding by wrongly guessing which
system is in use, and the need to carry around additional code
to deal with an additional scheme as costs, what are the
benefits that outweigh those costs".

For the present document, it is implemented, it is deployed, it
is in heavy use in some important applications, and, even if it
were appropriate to ask those questions, it would be too late.
So let's get it published and move on... whether moving on
involves yet more systems of this type or not.

   john

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call



[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux