On Wed, 24 Oct 2012, Manger, James H wrote: > > Currently, I don't think url.spec.whatwg.org distinguishes between > strings that are valid URLs and strings that can be interpreted as URLs > by applying its standardised error handling. Consequently, error > handling cannot be at the option of the software developer as you cannot > tell which bits are error handling. Well first, the whole point of discussions like this is to work out what the specs _should_ say; if the specs were perfect then there wouldn't be any need for discussion. But second, I believe it's already Anne's intention to add to the parsing algorithm the ability to abort whenever the URL isn't conforming, he just hasn't done that yet because he hasn't specced what's conforming in the first place. On Tue, 23 Oct 2012, David Sheets wrote: > > One algorithm? There seem to be several functions... > > - URI reference parsing (parse : scheme -> string -> raw uri_ref) > - URI reference normalization (normalize : raw uri_ref -> normal uri_ref) > - absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option) > - URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref) I don't understand what your four algorithms are supposed to be. There's just one algorithm as far as I can tell -- it takes as input an arbitrary string and a base URL object, and returns a normalised absolute URL object, where a "URL object" is a conceptual construct consisting of the components scheme, userinfo, host, port, path, query, and fragment, which can be serialised together into a string form. (I guess you could count the serialiser as a second algorithm, in which case there's two.) > Anne's current draft increases the space of valid addresses. No, Anne hasn't finished defining conformance yet. (He just started today.) You may be getting confused by the "invalid flag", which doesn't mean the input is non-conforming, but means that the input is uninterpretable. > > The de facto parsing rules are already complicated by de facto > > requirements for handling errors, so defining those doesn't increase > > complexity either (especially if such behaviour is left as optional, > > as discussed above.) > > *parse* is separate from *normalize* is separate from checking if a > reference is absolute (*absp*) is separate from *resolve*. No, it doesn't have to be. That's actually a more complicated way of looking at it than necessary, IMHO. > Why don't we have a discussion about the functions and types involved in > URI processing? > > Why don't we discuss expanding allowable alphabets and production rules? Personally I think this kind of open-ended approach is not a good way to write specs. Better is to put forward concrete use cases, technical data, etc, and let the spec editor take all that into account and turn it into a standard. Arguing about what precise alphabets are allowed and whether to spec something using prose or production rules is just bikeshedding. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'