On 16/04/12 21:06, Junio C Hamano wrote: > Andrew Sayers <andrew-git@xxxxxxxxxxxxxxx> writes: >> >> +Unlike Subversion, 'svn-fe' interprets property key/value pairs as >> +null-terminated binary strings. This means it will accept content >> +that Subversion normally wouldn't produce (such as filenames >> +containing tab characters) or would refuse to parse (such as usernames >> +containing Latin-1 characters). However, like Subversion it will >> +handle newlines incorrectly in filenames (see BUGS below). >> + > > Do the first two sentences in the above paragraph claim that it a bug that > 'svn-fe' does not mimick what Subversion does? I am not sure what lessons > the authors of tools, whose output is meant to feed svn-fe, are expected > to learn here. For example, is the purpose of the above paragraph to make > tool authors realize that "NUL terminates key and value, so I have to > refrain from using a key or a value that contains a NUL byte?" [*1*] Even > in that case, it is unclear to me what I (as an author of such a tool that > reads data from somewhere and format it to plesae svn-fe) could do with > that knowledge. > > [Footnote] > > *1* By the way, NULL is a pointer that does not point anywhere. The name > of a byte whose value is 0x00 is NUL. The dumpfile documentation says that "... property key/value pairs may be interpreted as binary data in any encoding by client tools"[1], but SVN itself interprets the data as UTF-8, so I was surprised to see svn-fe hadn't aped that behaviour. You could argue this is a bug if you want to call `svnadmin` the reference implementation. You could even argue that treating NUL characters specially is a bug if you want to call the documentation the official standard (albeit a bug shared by `svnadmin`). Personally I don't have a problem with either decision, so I've just noted some unobvious behaviour. Lessons to learn will depend on the author, but here are some I took: 1. UTF-8 is the most common encoding, but not the only one. If your tool only allows UTF-8 input and only produces only UTF-8 output then you are the limiting factor in your toolchain. In my case I think I'll just live with that, but I would like to have known before I started. 2. `svn` itself isn't universally considered the reference implementation, only a popular one. When deciding the correct behaviour (or the range of possible behaviours), it's not enough just to look at what `svn` does. 3. `svn-fe` doesn't slavishly follow either the documentation or `svn`. Beyond a certain point you have to actually check assumptions against svn-fe's behaviour (then document what you find so the next guy doesn't have to ;). Again, I think this is pragmatic but I would like to have known earlier. I've tried not to labour the above points, because it would be easy to overstate them and because other authors will take different lessons. It's certainly possible that some author would come to svn-fe wanting to do something crazy like encode newlines as NULs and come away realising C doesn't like that. A more concrete example is that a GSoC student who turns up next year to work on writing commits back to SVN will need to know svn-fe chose not to care about UTF-8 and that there's a nest of edge cases waiting for them. - Andrew [1]http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html