On 25/03/10 01:37PM, Junio C Hamano wrote: > Justin Tobler <jltobler@xxxxxxxxx> writes: > > To make machine parsing easier, this series introduces a NUL-delimited > > output mode for git-rev-list(1) via a `-z` option following a suggestion > > from Junio in a previous thread[1]. In this mode, instead of LF, each > > object is delimited with two NUL bytes and any object metadata is > > separated with a single NUL byte. Examples: > > > > <oid> NUL NUL > > <oid> [NUL <path>] NUL NUL > > Why do we need double-NUL in the above two cases? In the `<oid> [NUL <path>] NUL NUL` case, it would technically be possible for an object path to match an OID. The use of two NUL bytes signals when the object record ends. Without someother mechanism to know when a record starts/stops, even the `<oid> NUL NUL` case would need the two trailing NUL bytes to avoid being considered a potential path. If the output format would not result in any additional object metadata being appended, we could use a single NUL byte to delimit between objects in this case, but always using two NUL bytes allowed for a more consistent format. > > > ?<oid> [NUL <token>=<value>]... NUL NUL > > This one I understand; we could do without double-NUL and take the > lack of "=" in the token after NUL termination as the sign that the > previous record ended, though, to avoid double-NUL while keeping the > format extensible. > > As this topic is designing essentially a new and machine parseable > format, we could even unify all three formats into one. For example, > the format could be like this: > > <oid> NUL [<attr>=<value> NUL]... I was also considering something similar. This format could allow other object metadata like `--timestamp` to be supported in the future with a more flexible format. In the next version I'll implement a unified format here. > > where > > (1) A record ends when a new record begins. > > (2) The beginning of a new record is signaled by <oid> that is all > hexadecimal and does not have any '=' in it. I think this is a good idea. By always appending printed object metadata in the form `<token>=<value>`, we know that any entry without '=' must be the start of a new record. This removes the need for the two NUL bytes to indicate the end of a record. I'll use only a single NUL byte to delimit in the next version. > > (3) The traditional "rev-list --objects" output that gives path in > addition to the object name uses "path" as the <attr> name, > i.e. such a record looks like "<oid> NUL path=<path> NUL". > > (4) The traditional "rev-list --missing" output loses the leading > "?"; it is replaced by "missing" as the <attr> name, i.e. such > a record may look like "<oid> NUL missing=yes NUL..." together > with other "<token>=<value> NUL" pairs appended as needed at > the end. I think this is good. Instead of prefixing missing OIDs with '?', we can just append another token/value pair `missing=yes`. Thanks, -Justin