On Tue, Oct 23, 2012 at 4:51 PM, Ian Hickson <ian@xxxxxxxx> wrote: > On Wed, 24 Oct 2012, Christophe Lauret wrote: >> >> As a Web developer who's had to write code multiple times to handle URIs >> in very different contexts, I actually *like* the constraints in STD 66, >> there are many instances where it is simpler to assume that the error >> handling has been done prior and simply reject an invalid URI. > > I think we can agree that the error handling should be, at the option of > the software developer, either to handle the input as defined by the > spec's algorithms, or to abort and not handle the input at all. Yes, input is handled according to the specs' algorithmS. >> But why not do it as a separate spec? > > Having multiple specs means an implementor has to refer to multiple specs > to implement one algorithm, which is not a way to get interoperability. > Bugs creep in much faster when implementors have to switch between specs > just in the implementation of one algorithm. One algorithm? There seem to be several functions... - URI reference parsing (parse : scheme -> string -> raw uri_ref) - URI reference normalization (normalize : raw uri_ref -> normal uri_ref) - absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option) - URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref) Of course, some of these may be composed in any given implementation. In the case of a/@href and img/@src, it appears to be something like (one_algorithm = (resolve base_uri) . normalize . parse (scheme base_uri)) is in use. A good way to get interop is to thoroughly define each function and supply implementors with test cases for each processing stage (one_algorithm's test cases define some tests for parse, normalize, and resolve as well). Some systems use more than the simple function composition of web browsers... >> Increasing the space of valid addresses, when the set of addressable >> resources is not actually increasing only means more complex parsing rules. > > I'm not saying we should increase the space of valid addresses. Anne's current draft increases the space of valid addresses. This isn't obvious as Anne's draft lacks a grammar and URI component alphabets. You support Anne's draft and its philosophy, therefore you are saying the space of valid addresses should be expanded. Here is an example of a grammar extension that STD 66 disallows but WHATWGRL allows: <http://www.rfc-editor.org/errata_search.php?rfc=3986&eid=3330> > The de facto parsing rules are already complicated by de facto requirements for > handling errors, so defining those doesn't increase complexity either > (especially if such behaviour is left as optional, as discussed above.) *parse* is separate from *normalize* is separate from checking if a reference is absolute (*absp*) is separate from *resolve*. Why don't we have a discussion about the functions and types involved in URI processing? Why don't we discuss expanding allowable alphabets and production rules? David