Re: Last Call: An IETF URN Sub-namespace for Registered Protocol Parameters to BCP

Graham Klyne <GK@NineByNine.org> · Wed, 03 Jul 2002 19:07:41 +0100

At 10:57 AM 7/3/02 -0400, Keith Moore wrote:
> > Spurred by XML and related technologies (which I assert are far more than
> > mere "fashion") we are seeing URIs used for a wide range of purposes which
> > are not constrained by a requirement for dereferencing.   The use of URIs
> > for identifying arbitrary things is now a fact of life, and in some
> > technical domains is providing to be extremely useful.  You claim "harm",
> > but I recognize no such harm.
>
>Clarification: I claim "harm" for the proposed use of *URNs* because
>URNs were designed to be long-term stable names for (at least potentially)
>network-accessible resources, whereas the proposal is to use them as a
>way of generating globally unique strings like UUIDs or OIDs.

I still don't see the "harm" here.

Another way to look at this might be:  they all have potentially 
network-retrievable representations, but not all uses depend on being able 
to perform the retrieval.

> > Having different syntactic contexts in which names are used will 
> inevitably
> > lead to different syntactic name forms.  I submit that the real challenge
> > here is not to prevent the use of varying syntax, but to lock the various
> > syntactic forms to a common semantic definition
>
>Oddly enough, having different syntactic contexts also tends to cause
>differences in semantic definition.  In one syntactic context order
>of elements can be significant whereas it's not in the other. one syntactic
>context is designed to allow individual components to be accessed 
>independently
>of the others while another expects the entire resource description to
>be available to the consumer.  One makes it easy to group related
>items; another doesn't have a way of representing relationships between
>items.  The semantic definitions tend to be influenced by these factors.
>
>I'm all for reuse of data models where it makes sense, but if the goal
>is really to "lock the various syntactic forms to a common semantic
>definition" (presumably one which is compatible with XML) then I take
>strong issue with that, as the XML model is quite dysfunctional for
>many purposes.  (as are the others, it's just that XML is the current
>bandwagon)

I'm puzzled -- you appear to be arguing my point.  Yes, different syntactic 
frameworks will (in isolation) tend to yields differing semantics.  Yes, 
different syntactic frameworks are better suited for different 
purposes.  But it seems to me that referring different uses to the same 
original definition would help to inhibit that -- and if factors like 
ordering or grouping are significant, then the definition will (hopefully) 
capture that and place constraints on the syntactic contexts for re-use.

> > -- in this case, providing
> > a way to create syntactic URI forms that can be bound to protocol 
> semantics
> > in a way that inhibits semantic drift between the different forms.
>
>But such drift is almost inevitable.  You can't recast some existing
>data structure in XML and use it widely and expect the meanings of the
>protocol elements to stay the same.  And in essentially every example I've
>seen of an attempt to do this, the meanings of the protocol elements are
>changed subtly from the very beginning, usually by trying to use XML
>structure to represent relationships that aren't explicit in the original
>data model.  More generally, an XML representation of a data model will get
>used differently than the original representation, and the semantics of the
>individual protocol elements will almost certainly drift as a result.
>
>(Actually this happens even when you use the same representation.
>RFC 822 headers had subtly different meanings on BITNET than on
>the Internet, because there were enough differences in the two user
>communities and the mail reading programs used by those communities.
>Similarly, casting a data model into XML means that a different set
>of tools will be used to access/manipulate that data - indeed that
>is the entire point of doing so - but this *will* cause semantic drift
>in the data model between the two environments)
>
>Using URIs for the names of the data elements won't stop that kind of drift.

But not trying to re-use existing definitions seems to be a recipe for 
Balkanization.

Maybe it won't work for all applications, but I think there are a 
substantial number of cases where re-use of existing definitions is a 
reasonable and desirable goal.  I have two ongoing projects for which I 
would really like to see this URN namespace proposal approved:

(a) Distributed storage and analysis of email and other message metadata.

(b) common feature descriptions for IETF/W3C content negotiation efforts.

> > One of the motivating factors in this work (for me, at least, and I think
> > for others) has been to draw together some of the divergent strands of
> > thinking that are taking place in the IETF and W3C.  W3C are fundamentally
> > set on a course of using URIs as a generic space of identifiers.  IETF 
> have
> > a number of well-established protocols that use registries to allocate
> > names.  Neither of these are going to change in the foreseeable 
> future.  So
> > do we accept a Balkanization of Internet standards efforts, or do we 
> try to
> > draw them together?
>
>Some things don't mix very well, even if they are quite useful individually.
>The traditional examples are oil and water.

That seems like a non-argument for opposing this proposal.  Even emulsions 
have their uses.

> > A particular case in point is content negotiation.  The IETF have prepared
> > a specification for describing media features that uses a traditional form
> > of IANA registry to bind names to features.  In parallel with this, W3C
> > have prepared a specification which has some similar goals, but which uses
> > URIs to represent media features, and relies on the normal URI allocation
> > framework to ensure the minting of unique names as and when needed.  (I
> > have some reservations about this, but that can't change what is actually
> > happening.)
>
>But neither do we have to endorse it just so they will use our stuff.
>Especially when their using our stuff dilutes the utility of our stuff
>by not requiring widespread agreement on the media features used.

Come again?  That seems to me to be entirely non-sequitur.  How can other 
people using out stuff dilute its utility?  It is precisely in the nature 
of this proposal that using these URIs would be assenting to the IETF 
definition of their meaning.

> > This URN namespace proposal will provide a way to incorporate
> > the IETF feature registry directly into the W3C work, in a way which is
> > traceable through IETF specifications.   Without this, I predict that the
> > parties who are looking to use the W3C work (notably, mobile phone
> > companies) will simply go away and invent their own set of media features,
> > without any kind of clear relationship to the IETF features.
>
>The w3c approach is encouraging them to do this anyway, by having
>all media features be URIs that anyone can create/assign without any
>agreement from anyone else.

So we should roll over and play dead, and pretend that interoperability 
doesn't matter?

Actually, that's a misrepresentation of the W3C position, which is that 
vocabularies gain currency through use -- the more people who use them, the 
more useful, and more widely used they become. (Sure, that's a 
generalization.)  This approach seems to be very much in the spirit of the 
IETF I've been participating in over the past few years -- it's not our 
role to decide what will and will not work, but to provide an environment 
in which new technologies can evolve and find currency, and promote 
interoperability wherever we can.

> > In summary:  URIs *will* be used to identify protocol parameters.  The 
> IETF
> > cannot prevent that.  What the IETF can do by supporting a particular form
> > of such use is to try and ensure that such use remains bound by a clear,
> > authoritative chain of specifications to the IETF specification of what
> > such parameters mean.  The harm that comes from not doing this, in my 
> view,
> > is that we end up with a multiplicity of URIs that mean nearly, but not
> > quite, the same thing as an IETF protocol parameter.  That outcome, I
> > submit, cannot be good for longer term interoperability between IETF and
> > other organizations' specifications.
>
>The likely consequence of what is being proposed is for the URIs that we
>define to mean nearly, but not quite, the same thing as an IETF protocol
>parameter - but we have to try to pretend that they mean the same thing.
>And it will degrade interoperability.

Er, no:  we *define* them to mean the *same* thing.  If implementations 
play fast and loose with the defined meaning, that's nothing new.

> > >d) embed NO visible structure in the URNs - just assign each
> > >    parameter value a sequence number.  people who want to use
> > >    those URNs in XML or whatever would need to look them up at IANA's
> > >    web site.
> >
> > I disagree.  This requirement actively works against one of the 
> motivations
> > for using URIs in application data formats;  that there be a scalable
> > framework for different organizations and persons to mint their own
> > identifiers.
>
>The fact that people want to use URIs in this way does not mean that it's
>appropriate to use URNs in this way.  If people want to mint their own URNs,
>then they have to follow the rules for URNs.  Those rules *do not*
>permit arbitrary organizations and persons to mint their own identifiers
>without explicit delegation from a URN namespace, for very good reasons
>which are consistent with URNs' purposes.

Ah, that's a misunderstanding.  One of the reasons I favour using URNs in 
this way (and contrary to the often touted W3C position) is that it 
provides a form of URI that is clearly *not* minted by any Tom, Dick or 
Harry working in isolation.  The definition of any urn:ietf:... URI is 
subject to the IETF consensus process, so can be expected to have been 
involved in some level of community review.  My point here was that, 
because they conform to a common URI syntactic framework, they can be used 
interchangeably in some contexts with experimental and private-use 
identifiers.  (In a sense, this might be viewed as a converse of the 
X-header approach:  arbitrary URIs may be treated as experimental or 
private use, unless they are allocated within a URI namespace controlled by 
a recognized authority in the area of their application.

>The very temptation to treat URNs as if they were as malleable as other
>URIs is part of what makes this proposal dangerous.  Since I think that
>URNs *will* be widely misused if they are used for protocol elements,
>I'd far rather have IANA assign ordinary URIs for this - then we will
>still get semantic drift but at least it won't dilute the value of URNs.

In what sense are URNs not ordinary URIs?  They have particular 
requirements for persistence that are not shared by all URI schemes.  And 
there is a requirement for "location independence", but what that means 
isn't always clear.

But mainly, the goal of this proposals is emphatically *not* to make URNs 
"malleable" (in the sense of, say, http: URIs which can be reassigned at 
will by domain owners), but to allow the introduction of some URIs that can 
clearly be seen to be stable and persistent.

I'd be happy for IANA to assign "ordinary URIs", assuming that by this you 
mean something like http://www.ietf.org/..., as long as there was a clear 
organizational commitment that such a URI, once allocated, would never be 
reallocated for any other purpose.   It's the particular properties of URNs 
that are desired here, not any sense that they are somehow a "special" form 
of URIs.

> > To use an identifier, one must:
> >
> > (i) have a framework for assigning identifier values, in such a way 
> that it
> > is possible by some means for a human to locate its defining
> > specification.  I can't see how to do this without exploiting a visible
> > syntactic structure in the name.
>
>ISBNs do not have a visible syntactic structure, at least, not an
>obvious one.  But they're quite frequently used to look up book information.

I understand that ISBNs aren't persistent -- they get reused.  How many 
books are "in print" at any time?  I don't think this is quite Internet scale.

Anyway, ISBN's *do* have an internal syntactic structure.  From 
http://www.isbn.org/standards/home/isbn/us/isbnqa.asp#Q4:

[[
Does the ISBN have any meaning imbedded in the numbers?

The four parts of an ISBN are as follows:
Group or country identifier which identifies a national or geographic 
grouping of publishers;
Publisher identifier which identifies a particular publisher within a group;
Title identifier which identifies a particular title or edition of a title;
Check digit is the single digit at the end of the ISBN which validates the 
ISBN.
]]

> > (ii) have a framework for actually using the identifier in an
> > application:  in this case, I agree that the identifier should 
> generally be
> > treated as opaque.
> >
> > Also, I think (d) contradicts your goal (a):  I cannot conceive any
> > scalable resolution mechanism that does not in some sense depend on
> > syntactic decomposition of the name.
>
>You should really read up on the CNRI handle system then.  There are a lot
>of things I don't like about it but it really was designed to have exactly
>this property.

Based on a December 2001 article 
(http://www.dlib.org/dlib/december01/blanchi/12blanchi.html), it seems to 
me that Handles too depend on some syntactic structure to partition the 
search space -- based on dynamic content types and metadata schema.  (I 
should be clear that I'm using the term syntactic structure in an abstract 
sense, a la McCarthy 
(http://www-formal.stanford.edu/jmc/towards/node12.html#SECTION000120000000000000000), 
rather than in the sense of a specific arrangement of characters.)

Ah yes, and according to the internet draft on handles:
   http://www.ietf.org/internet-drafts/draft-sun-handle-system-09.txt
there *is* a clear syntactic structure:
[[
  2. Handle Namespace

     Every handle consists of two parts: its naming authority, otherwise
     known as its prefix, and a unique local name under the naming
     authority, otherwise known as its suffix. The naming authority and
     local name are separated by the ASCII character "/". A handle may
     thus be defined as:

       <Handle> ::= <Handle Naming Authority> "/" <Handle Local Name>
  ]]
How each naming authority deals with scaling within its domain of authority 
doesn't seem to be specified.

(Actually, when I wrote the above, I later realized that I misspoke 
slightly, because some systems work in constrained contexts -- I was 
referring to systems operating at global Internet scale without further 
contextualization.  But I think the general idea still holds here -- if you 
want to reliably and quickly dereference an identifier with Internet scope, 
it cannot be completely opaque.)

#g

-------------------
Graham Klyne
<GK@NineByNine.org>