On 09/04/2013 02:23 PM, Junio C Hamano wrote: > "whitespace-separated" implies that we may allow fields separated with not a single SP, but with double SPs or even HTs between them. I personally do not think we should be so loose Okay, I will look at making it more strict. See proposed format below. >> +Quote arguments containing whitespace as if in C source code. > > Probably "as if they were strings in C source code". Fixed. >> +terminate each argument by NUL and each line by LF NUL. > > This is a somewhat interesting choice of the record terminator. Do we have a precedent to use LF NUL elsewhere? If this is the first such case where we need to express variable number of NUL-separated fields in a record, I think I am fine with LF NUL (but I am sure other people would give us a > better convention if we ask them politely ;-)), but I just want to make sure we do it the same way as other codepaths, if exist, that have to handle this kind of thing. Nothing else uses LF NUL. I chose it as a starting point for this very discussion, which I asked about in $gmane/233653. In this particular use case we know the last field will never be LF but that may not be so for future cases. There is no way to represent sentinel-terminated arbitrary variable-width records of NUL-terminated fields without some kind of escaping for the sentinel value, but the whole point of -z is to avoid escaping. Below is a survey of all mentions of NUL and \0 in documented formats as of v1.8.4. The summary is that most are fixed-width records but a few have variable width allowing n or n+1 fields. In all variable-width cases there is structured information in the first field that indicates the number of NUL-terminated fields to expect. In the motivating case here, we could use a --no-old or --have-old option to indicate in one field how many more to expect in the record, but that will be quite verbose. Side note: I'd like to reserve room for the leading options to include things like "-m NUL <reason> NUL" so we cannot keep them all in in a single NUL-terminated, SP-separated field. Another approach is to introduce a way to represent "not here" for the <oldvalue> argument that is not an otherwise valid value. This would make the non-option part of the record have a fixed width of 3 fields. For example, we could use SP in -z mode: [-<opt> NUL]... <ref> NUL <new> NUL (<old>|SP) NUL and the last field can be optional in non-z mode anyway: [-<opt> SP]... <ref> SP <new> [SP <old>] LF Or we could use a character like "~" (other ideas?): [-<opt> NUL]... <ref> NUL <new> NUL (<old>|~) NUL and make it available in non-z mode too: [-<opt> SP]... <ref> SP <new> [SP (<old>|~)] LF Thoughts? -Brad Survey of NUL in documented formats: ------------------------------------------------------------------------ * Documentation/diff-format.txt: The -z mode for --numstat prints NUL-terminated lines but there is exactly one path at the end of each entry and the earlier fields are separated by TAB because they are structured. * Documentation/diff-options.txt: The -z mode for diff-tree output prints structured SP/TAB-separated fields in a NUL-terminated field followed by either one or two NUL-terminated paths. This is variable width but the first field tells us how wide. * Documentation/git-apply.txt: The -z mode forwards to --numstat diff options. * Documentation/git-check-attr.txt: The -z mode for stdin reads NUL-terminated paths. * Documentation/git-check-ignore.txt: The -z mode for stdin reads NUL-terminated paths. The -z mode for output prints a fixed-width table with every group of four NUL-terminated fields forming a row. * Documentation/git-checkout-index.txt: The -z mode reads NUL-terminated paths. * Documentation/git-commit.txt: The -z mode forwards to git-status. * Documentation/git-grep.txt: The -z mode separates file names from the matched line by a NUL. Therefore NUL divides LF-terminated lines into two pieces. * Documentation/git-ls-files.txt: The -z mode prints NUL-terminated lines but there is exactly one path at the end of each entry and the earlier fields are separated by SP and TAB because they are structured. * Documentation/git-ls-tree.txt: The -z mode prints NUL-terminated lines but there is exactly one path at the end of each entry and the earlier fields are separated by SP and TAB because they are structured. * Documentation/git-mktree.txt: The -z mode reads NUL-terminated lines as output by ls-tree -z. * Documentation/git-status.txt: The -z mode of --porcelain separates a variable number of entries by NUL. The beginning of each entry allows one to know the number of NUL-terminated fields to expect (A = 1 total NUL, R = 2 total NULs, etc.). * Documentation/git-update-index.txt: The -z mode of --stdin separates paths by NUL. The -z mode of --index-info separates entries by NUL but there is exactly one path at the end of each entry and the earlier fields are separated by SP and TAB because they are structured. * Documentation/rev-list-options.txt: The --header option prints commits separated by NUL but they are never empty. ------------------------------------------------------------------------ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html