Justin Tobler <jltobler@xxxxxxxxx> writes: > ?<oid> [SP <token>=<value>]... LF > > where values containing LF or SP are printed in a token specific fashion > so that the resulting encoded value does not contain either of these two > problematic bytes. For example, missing object paths are quoted in the C > style so they contain LF or SP. "so" -> "when"??? > To make machine parsing easier, this series introduces a NUL-delimited > output mode for git-rev-list(1) via a `-z` option following a suggestion > from Junio in a previous thread[1]. In this mode, instead of LF, each > object is delimited with two NUL bytes and any object metadata is > separated with a single NUL byte. Examples: > > <oid> NUL NUL > <oid> [NUL <path>] NUL NUL Why do we need double-NUL in the above two cases? > ?<oid> [NUL <token>=<value>]... NUL NUL This one I understand; we could do without double-NUL and take the lack of "=" in the token after NUL termination as the sign that the previous record ended, though, to avoid double-NUL while keeping the format extensible. As this topic is designing essentially a new and machine parseable format, we could even unify all three formats into one. For example, the format could be like this: <oid> NUL [<attr>=<value> NUL]... where (1) A record ends when a new record begins. (2) The beginning of a new record is signaled by <oid> that is all hexadecimal and does not have any '=' in it. (3) The traditional "rev-list --objects" output that gives path in addition to the object name uses "path" as the <attr> name, i.e. such a record looks like "<oid> NUL path=<path> NUL". (4) The traditional "rev-list --missing" output loses the leading "?"; it is replaced by "missing" as the <attr> name, i.e. such a record may look like "<oid> NUL missing=yes NUL..." together with other "<token>=<value> NUL" pairs appended as needed at the end. Hmm?