--On Tuesday, October 15, 2013 08:03 -0700 SM <sm@xxxxxxxxxxxx> wrote: > At 01:54 15-10-2013, John C Klensin wrote: >> My reasoning is that, while the change seems fine, the >> precedent seems atrocious. If this is approved via >> Independent Stream publication and the next case that comes >> along is, unlike this one, generally hated by the community, >> the amount of hair-splitting required to deny that one having >> approved this one would be impressive.. and bad for the IETF. > I was unable to see whether this specification would be of use > to data.gov as the site is still unaccessible. There is an > opendata site in Brazil ( http://dados.gov.br/), France ( > http://www.data.gouv.fr/ ) and several other countries. The > specification may be relevant to opendata which is something > of interests to governments. It would have been better if the > specification was processed in the IETF Stream but the > community was not interested in taking it up (see msg-id: > CAC4RtVAeTGpHFA01YX=PS7CYeOfYFS0Sc-g3wb05USnoWyUJMQ@mail.gmail > .com). >... As long as the question is "should the IETF approve registration of an extension whose documentation is published via the ISE", most of the above is massively irrelevant. Hence the change in subject line and the copy to Nevil (and I'm not likely to discuss it further on this list). >From the standpoint of retrieval of information from a database (opendata or otherwise), my experience suggests that CSV fragments, especially as specified here, are going to be fairly irrelevant. Not harmful, probably ok for those who think they have a use for them, just useless for lots of retrieval functions. The problem is that, in general and with a dataset of any real size, people don't think of things in terms of row and column numbers (getting them to think that way is quite error-prone). Especially when there are _lots_ of columns, retrieval by column number is usually a bad idea. Using an example from the I-D to construct a different one, http://example.com/data.csv#col="temperature" would make a lot more sense in many cases than http://example.com/data.csv#col=2 But the spec doesn't allow that and it is perhaps better handled with a query rather than a fragment (although that opens the problem of where queries are processed that has tied the URNbis WG in knots). It is also not unusual with statistical and scientific databases (especially non-relational ones) to have named rows as well as named columns, but, while RFC 4180 allows for "; header=present", it makes no provision for 'rowNames="present"', much less what many data analysis packages would really like to see, which would be something like 'rowNames="present, col=NN"' with the latter designating the "column" (or columns) of the CSV file in which those names appeared, perhaps borrowing from regular expressions and allowing "$" and "$-1" instead of NN. There is no reason why one couldn't have those sorts of arrangements with a CSV format and some systems do, but is isn't the text/csv of RFC 4180. For datasets of non-trivial size and complexity, fragments as specified here are going to be really useful only when the application retrieves what used to be called a "codebook" and first, uses it to change row and column identifiers or other query-supporting information into row and column numbers, and then constructs the URI with this fragment ID. I suspect that won't be common for lots of reasons starting from the observation that many modern database management and database access technologies discourage detached codebooks. Those types of application situations also lead to another problem with the fragment approach. Again going back to the example in the draft, if one had, instead of date,temperature,place 2011-01-01,1,Galway one had date, time,observed-melting-point 2011-01-01,0900.3, 0.001 2011-01-01,0901.2, 0.002 2011-01-01,0901.8, -0.09 Then many systems would do the conversions if the last column to floating point as the data were being read (and might convert times or dates as well). Depending on how much information was kept, responding to http://www.example.com/melting-points.csv#col=3 with 0.001 0.002 -0.09 rather than 0.001 0.002 -0.090 or 0.1 x10**-2 etc., as the spec seems to require, might require significant work. Some processors would care, others wouldn't, but you see the problem. Again, I don't see a big problem with this addition although there are a number of things I'd like to see either more clear or explicitly warned about. I don't see arguments about what problems it doesn't solve as relevant unless extravagant claims are made for it, and the current draft avoids such claims. But those issues are quite separate from the issue of the IESG passing responsibility for documentation and evaluation of a registration modification request (for a registration for which I believe the IESG to be the "owner") off to the ISE. best, john