On Mon, Mar 10, 2025 at 3:32 PM Justin Tobler <jltobler@xxxxxxxxx> wrote: > > When walking objects, git-rev-list(1) prints each object entry on a > separate line in the form: > > <oid> LF > > Some options, such as `--objects`, may print additional information > about the object on the same line: > > <oid> SP [<path>] LF > > In this mode, if the object path contains a newline it is truncated at > the newline. > > When the `--missing={print,print-info}` option is provided, information > about any missing objects encountered during the object walk are also > printed in the form: > > ?<oid> [SP <token>=<value>]... LF > > where values containing LF or SP are printed in a token specific fashion > so that the resulting encoded value does not contain either of these two > problematic bytes. For example, missing object paths are quoted in the C > style so they contain LF or SP. > > To make machine parsing easier, this series introduces a NUL-delimited > output mode for git-rev-list(1) via a `-z` option following a suggestion > from Junio in a previous thread[1]. In this mode, instead of LF, each > object is delimited with two NUL bytes and any object metadata is > separated with a single NUL byte. Examples: > > <oid> NUL NUL > <oid> [NUL <path>] NUL NUL > ?<oid> [NUL <token>=<value>]... NUL NUL > > In this mode, path and value info are printed as-is without any special > encoding or truncation. > > For now this series only adds support for use with the `--objects` and > `--missing` options. Usage of `-z` with other options is rejected, so it > can potentially be added in the future. > > One idea I had, but did not implement in this version, was to also use > the `<token>=<value>` format for regular non-missing object info while > in the NUL-delimited mode. I could see this being a bit more flexible > instead of relying strictly on order. Interested if anyone has thoughts > on this. :) Without taking a deeper look, I think token=value has the benefit of being self-describing at the cost of more output bytes (which might matter over the wire, for example). Generally I like the idea; sometimes I find it troublesome having to parse prose manuals for the specifics of output formats like field order, especially when I end up coding a parser for the format. If the field order doesn’t matter to the consumer, then perhaps using ordered fields AWK-style is inappropriately terse? OTOH, the -z format is for machines, and they don’t need human labels ;) [I think token labels would be a great parser-writing and debugging aid] Best, Ben > > This series is structured as follows: > > - Patches 1 and 2 do some minor preparatory refactors. > > - Patch 3 adds the `-z` option to git-rev-list(1) to print > objects in a NUL-delimited fashion. Printed object paths with > the `--objects` option are also handled. > > - Patch 4 teaches the `--missing` option how to print info in a > NUL-delimited fashion. > > Thanks for taking a look, > -Justin > > [1]: <xmqq5xlor0la.fsf@gitster.g> > > Justin Tobler (4): > rev-list: inline `show_object_with_name()` in `show_object()` > rev-list: refactor early option parsing > rev-list: support delimiting objects with NUL bytes > rev-list: support NUL-delimited --missing option > > Documentation/rev-list-options.adoc | 26 +++++++++ > builtin/rev-list.c | 86 ++++++++++++++++++++++------- > revision.c | 8 --- > revision.h | 2 - > t/t6000-rev-list-misc.sh | 34 ++++++++++++ > t/t6022-rev-list-missing.sh | 30 ++++++++++ > 6 files changed, 155 insertions(+), 31 deletions(-) > > > base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3 > -- > 2.49.0.rc2 > > -- D. Ben Knoble