Re: Last Call: An IETF URN Sub-namespace for Registered Protocol Parameters to BCP

Keith Moore <moore@cs.utk.edu> · Wed, 03 Jul 2002 17:02:11 -0400

> > > Spurred by XML and related technologies (which I assert are far more than
> > > mere "fashion") we are seeing URIs used for a wide range of purposes which
> > > are not constrained by a requirement for dereferencing.   The use of URIs
> > > for identifying arbitrary things is now a fact of life, and in some
> > > technical domains is providing to be extremely useful.  You claim "harm",
> > > but I recognize no such harm.
> >
> >Clarification: I claim "harm" for the proposed use of *URNs* because
> >URNs were designed to be long-term stable names for (at least potentially)
> >network-accessible resources, whereas the proposal is to use them as a
> >way of generating globally unique strings like UUIDs or OIDs.
> 
> I still don't see the "harm" here.

basically, it's trivializing them.  they're overkill for this purpose, and
using URns for this purpose makes them seem less useful than they really
are.  

that, and I think there'll be a very strong demand for them to be human-readable 
(i.e. to have visible structure) and syntactically derivable from the canonical 
name for the protocol elements (for those that have such names).

> >I'm all for reuse of data models where it makes sense, but if the goal
> >is really to "lock the various syntactic forms to a common semantic
> >definition" (presumably one which is compatible with XML) then I take
> >strong issue with that, as the XML model is quite dysfunctional for
> >many purposes.  (as are the others, it's just that XML is the current
> >bandwagon)
> 
> I'm puzzled -- you appear to be arguing my point.  Yes, different syntactic 
> frameworks will (in isolation) tend to yields differing semantics.  Yes, 
> different syntactic frameworks are better suited for different 
> purposes.  But it seems to me that referring different uses to the same 
> original definition would help to inhibit that -- and if factors like 
> ordering or grouping are significant, then the definition will (hopefully) 
> capture that and place constraints on the syntactic contexts for re-use.

I just don't happen to share your faith in this as a mechanism to inhibit
or discourage semantic drift.  In every example I can think of where one
data model is exported into a different context there has been semantic
drift, even when the same names and official definitions were retained.
(maybe there's less drift this way, maybe not - but it certainly doesn't
inhibit drift)

> >Using URIs for the names of the data elements won't stop that kind of drift.
> 
> But not trying to re-use existing definitions seems to be a recipe for 
> Balkanization.

I don't know how to avoid Balkanazation.  Sometimes it seems better to
let data models fork rather than to try to reconcile various differences -
I'd cite RFC822, usenet, HTTP, and SIP as a good example of things that
we shouldn't pretend have the same protocol elements even though
we recognize that they share a common ancestry. 

> Maybe it won't work for all applications, but I think there are a 
> substantial number of cases where re-use of existing definitions is a 
> reasonable and desirable goal.  

I don't claim that re-use of a data model is not potentially useful.
If nothing else, an existing data model can serve as a useful starting 
point for a new data model when the requirements or syntactic structures
dictate not using the old one.

But neither do I want to give official blessing to folks to re-cast 
traditional IETF protocols into new syntactic forms.  And a lot of
the interest I've seen in having URI equivalents for IETF protocol
parameter names was from people who wanted to do just that - often
with the explicit intent of producing variant implementations in order
to disrupt the installed base.  

> >But neither do we have to endorse it just so they will use our stuff.
> >Especially when their using our stuff dilutes the utility of our stuff
> >by not requiring widespread agreement on the media features used.
> 
> Come again?  That seems to me to be entirely non-sequitur.  How can other 
> people using out stuff dilute its utility?  It is precisely in the nature 
> of this proposal that using these URIs would be assenting to the IETF 
> definition of their meaning.

no it's not, because of the semantic drift that will occur.   

Someone once tried to demonstrate to me that it was perfectly reasonable
to express iCalendar events in XML - but her demonstration used XML's 
date representation which didn't have a proper concept of timezones.  
Interpretation of dates in iCalendar were dependent on a separate timezone
element, whereas the XML tool wanted to treat those dates as standalone.
so the "obvious" conversion of iCalendar to XML - even though the elements
mapped one-to-one - caused semantic drift and a loss of important 
functionality.

> > > This URN namespace proposal will provide a way to incorporate
> > > the IETF feature registry directly into the W3C work, in a way which is
> > > traceable through IETF specifications.   Without this, I predict that the
> > > parties who are looking to use the W3C work (notably, mobile phone
> > > companies) will simply go away and invent their own set of media features,
> > > without any kind of clear relationship to the IETF features.
> >
> >The w3c approach is encouraging them to do this anyway, by having
> >all media features be URIs that anyone can create/assign without any
> >agreement from anyone else.
> 
> So we should roll over and play dead, and pretend that interoperability 
> doesn't matter?

It's not clear that doing things the w3c way helps interoperability.

> Actually, that's a misrepresentation of the W3C position, which is that 
> vocabularies gain currency through use -- the more people who use them, the 
> more useful, and more widely used they become. 

That's true to a point, but it also seems to be the case that controlled
vocabularies that need to have consistent meaning across large groups
need very careful definition and, well, "control".  Natural languages,
by contrast, tend to drift continuously.  Sometimes that's useful, but
perhaps not as useful for computer protocols as for humans that can
intuitively accomodate a certain amount of semantic skew.

> >The likely consequence of what is being proposed is for the URIs that we
> >define to mean nearly, but not quite, the same thing as an IETF protocol
> >parameter - but we have to try to pretend that they mean the same thing.
> >And it will degrade interoperability.
> 
> Er, no:  we *define* them to mean the *same* thing.  If implementations 
> play fast and loose with the defined meaning, that's nothing new.

At the same time, by explicitly exporting them we are encouraging 
semantic drift.  

> >The very temptation to treat URNs as if they were as malleable as other
> >URIs is part of what makes this proposal dangerous.  Since I think that
> >URNs *will* be widely misused if they are used for protocol elements,
> >I'd far rather have IANA assign ordinary URIs for this - then we will
> >still get semantic drift but at least it won't dilute the value of URNs.
> 
> In what sense are URNs not ordinary URIs?  They have particular 
> requirements for persistence that are not shared by all URI schemes.  

In order to make a URN persistent you really need to make them opaque 
(or mostly so) to humans.   It's really too bad that we even allowed
URN namespace IDs to be human-meaningful, but that's water under the bridge.

> > > (i) have a framework for assigning identifier values, in such a way 
> > that it
> > > is possible by some means for a human to locate its defining
> > > specification.  I can't see how to do this without exploiting a visible
> > > syntactic structure in the name.
> >
> >ISBNs do not have a visible syntactic structure, at least, not an
> >obvious one.  But they're quite frequently used to look up book information.
> 
> I understand that ISBNs aren't persistent -- they get reused.  

They're not supposed to be, but it does happen in some countries - 
particularly those with less ISBN space allocated to them.  
So we have a NAT-like problem for ISBNs ...

> Anyway, ISBN's *do* have an internal syntactic structure.  

I didn't say they didn't have one, I just said it was not obvious.

> 
> > > (ii) have a framework for actually using the identifier in an
> > > application:  in this case, I agree that the identifier should 
> > generally be
> > > treated as opaque.
> > >
> > > Also, I think (d) contradicts your goal (a):  I cannot conceive any
> > > scalable resolution mechanism that does not in some sense depend on
> > > syntactic decomposition of the name.
> >
> >You should really read up on the CNRI handle system then.  There are a lot
> >of things I don't like about it but it really was designed to have exactly
> >this property.
> 
> Based on a December 2001 article 
> (http://www.dlib.org/dlib/december01/blanchi/12blanchi.html), it seems to 
> me that Handles too depend on some syntactic structure to partition the 
> search space -- based on dynamic content types and metadata schema.  

Handles have evolved a bit since first envisioned - as I understand it the 
problem wasn't the inability of the non-partitioned search service to scale 
up to the number of queries but rather the difficulties associated with
everybody trusting a centrally maintained flat search service.

Someone from cnri might be able to fill in more detail.

> Ah yes, and according to the internet draft on handles:
>    http://www.ietf.org/internet-drafts/draft-sun-handle-system-09.txt
> there *is* a clear syntactic structure:

Yes, but the searching isn't (didn't used to be) federated according to that
structure.  The scalability of the searching didn't depend on it - 
federating actually slowed things down unless you happened to consult the 
right server first.  (locality does affect search speed)

> But I think the general idea still holds here -- if you 
> want to reliably and quickly dereference an identifier with Internet scope, 
> it cannot be completely opaque.)

Hashing is faster than tree searching, especially if the tree is distributed.
you federate the lookup because of trust issues (which are a kind of scaling
issue, but not in terms of bandwidth or cpu cycles) and ease-of-cost-recovery 
issues, not to make the lookup more efficient or cheaper.

Keith