Re: opaque, was Call for comment: <draft-iab-doi-04.txt> (Assigning Digital Object Identifiers to RFCs)

John C Klensin <john-ietf@xxxxxxx> · Sun, 05 Jul 2015 14:24:32 -0400

--On Sunday, 05 July, 2015 08:00 +1200 John R Levine
<johnl@xxxxxxxxx> wrote:

>> Since DOIs are opaque, that doesn't preclude future use of a
>> numeric prefix as well for something completely different.
> 
> Right.  Now I see what the problem is -- opaque identifiers
> are jargon from databases (I did my PhD research on them) and
> many IETFers don't understand what they are.
> 
> The point of an opaque identifier is that you can make no
> assumptions whatsoever about what its structure or format is.
> You can just find it somehow, and you can hand it back to
> services that use it to look up what it refers to.
>...
> I hope it's now clear why that would be a bad idea, and also
> why you shouldn't make any assumptions about what the DOIs of
> RFCs after RFC9999 will be.

And the problem with opaque identifiers that happen to be
construed in a consistent way that humans can deduce related to
something else is that they will be used that way.  To be clear
about what follow, I'm far more concerned about the case of
"given DOI, find document" than I am about "given document
bibliographic reference, find and write down the DOI".

To use his example, Andy knows how to get from
10.1364/JOCN.4.000001 to a particular journal, volume, and page
number and he is almost certainly going to do that by using the
algorithm in his head.  Unless he is much more compulsive and
has far more time on his hands than I believe is the case, he is
unlikely to go to a DOI resolver system for each of the DOIs in
that form, resolver systems that will just tell him what he knew
already and that may point him through more indirection than he
would find useful if he wants the document.    You may know it
is formally an opaque identifier, he may know it is formally an
opaque identifier, but, in practice, he knows there is nothing
opaque about it and that it would be silly to pretend.

Now, suppose the journal publisher decides that, from volume 8
onward, they are going to add 100 times the first three digits
of Pi to the volume number field, eliminate the separating dot,
and add an extra zero to the page number field in case they were
to publish a lot that year.   Assuming I got the arithmetic
right, hat would give the next volume's first article a DOI of 

  10.1364/JOCN.3320000001

Andy (and hundreds of other readers who are behaving the same
way) would, I assume, be at least mildly irritated because the
new format is different and requires that they do something
different and perhaps a bit harder.   The curators of the system
would, no doubt, say "opaque identifier, you had no right to the
expectation that you could parse the DOI and translate it into a
volume and page number, and still don't have that right even if
you can figure out the new algorithm".  Both would be correct in
their positions, but knowing that is not especially helpful.

Now assume that, a few years in the future and perhaps to
commemorate volume 10 and to educate their readers, the
identifier were changed again to eliminate "JOCN." and the
algorithm and, instead, use some form of a hash on the article
contents (plus a database lookup to check for uniqueness and a
way to adjust if needed).  Again, it is an opaque identifier, so
Andy and his colleagues have, in the eyes of the
identifier-assigning folks and your comments above, no basis for
complaint even though this new identifier format forces them to
do a database lookup for each DOI.  From their point of view
that is unreasonable.  When publisher 1364 also raises the
subscription rates to cover the cost of the high performance and
redundant DOI servers that were not needed before because most
readers knew the algorithm and skipped the lookup, it would seem
even more unreasonable.  On the other hand, the DOI-assigning
database administrators just say "opaque identifier, what are
you complaining about".  And, again, both are right.

I think there are clear human factors preferences about whether
the use cases or the opaque identifier argument prevails, but
YMMD.

Now, restated more precisely and in the light of your
explanation, I objected (and object) to the choice of "rfc1149"
because it provides an opportunity for confusion if the format
of the identifier is changed in the future and because the use
of ASCII characters, especially as part of what will be
perceived as a field, rather than a dot-separated subfield, may
be inconvenient to our growing international community for no
good reason.    That doesn't mean "can't be changed in the
future" and I apologize for anything I may have said that was
interpreted that way.  But saying "opaque identifier" does not
make changes less inconvenient and disruptive for the reasons
described above.

Now my prediction, for reasons that parallel Melinda's comments,
is that no active participant in the IETF will ever (other than
experimentally or for demonstration purposes) locate and
retrieve an RFC, especially an RFC referenced from another RFC,
by looking up the DOI.  That probably means that, with the
exception of symbolic value and a tiny number of readers,
including the DOI for an RFC in references within RFCs is a
waste of time and bits, but see below.  Because we won't use
them, an opaque identifier, even 10.17487/gazornplatz, should be
just fine for direct IETF purposes.  If the goal for assigning
DOIs is symbolic, i.e., not that we expect anyone to use them
for find RFCs but to impress some group of people with the fact
that we have and assign DOIs because doing so adds prestige or
credibility.   But, the more we rely on the assumption that
RFC-related DOIs won't be used to find RFCs, the more important
it is that the suffixes be structured in a well-known, obvious,
and stable way in practice, even if they are opaque identifiers
in DOI theory.

  best,
    john