RE: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@xxxxxx from September 2012)

Ian Hickson <ian@xxxxxxxx> · Wed, 24 Oct 2012 05:05:44 +0000 (UTC)

On Wed, 24 Oct 2012, Manger, James H wrote:
> 
> Currently, I don't think url.spec.whatwg.org distinguishes between 
> strings that are valid URLs and strings that can be interpreted as URLs 
> by applying its standardised error handling. Consequently, error 
> handling cannot be at the option of the software developer as you cannot 
> tell which bits are error handling.

Well first, the whole point of discussions like this is to work out what 
the specs _should_ say; if the specs were perfect then there wouldn't be 
any need for discussion.

But second, I believe it's already Anne's intention to add to the parsing 
algorithm the ability to abort whenever the URL isn't conforming, he just 
hasn't done that yet because he hasn't specced what's conforming in the 
first place.

On Tue, 23 Oct 2012, David Sheets wrote:
> 
> One algorithm? There seem to be several functions...
> 
> - URI reference parsing (parse : scheme -> string -> raw uri_ref)
> - URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
> - absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
> - URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)

I don't understand what your four algorithms are supposed to be.

There's just one algorithm as far as I can tell -- it takes as input an 
arbitrary string and a base URL object, and returns a normalised absolute 
URL object, where a "URL object" is a conceptual construct consisting of 
the components scheme, userinfo, host, port, path, query, and 
fragment, which can be serialised together into a string form.

(I guess you could count the serialiser as a second algorithm, in which 
case there's two.)

> Anne's current draft increases the space of valid addresses.

No, Anne hasn't finished defining conformance yet. (He just started 
today.)

You may be getting confused by the "invalid flag", which doesn't mean the 
input is non-conforming, but means that the input is uninterpretable.

> > The de facto parsing rules are already complicated by de facto 
> > requirements for handling errors, so defining those doesn't increase 
> > complexity either (especially if such behaviour is left as optional, 
> > as discussed above.)
> 
> *parse* is separate from *normalize* is separate from checking if a 
> reference is absolute (*absp*) is separate from *resolve*.

No, it doesn't have to be. That's actually a more complicated way of 
looking at it than necessary, IMHO.

> Why don't we have a discussion about the functions and types involved in 
> URI processing?
>
> Why don't we discuss expanding allowable alphabets and production rules?

Personally I think this kind of open-ended approach is not a good way to 
write specs. Better is to put forward concrete use cases, technical data, 
etc, and let the spec editor take all that into account and turn it into a 
standard. Arguing about what precise alphabets are allowed and whether to 
spec something using prose or production rules is just bikeshedding.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'