Re: [PATCH 1/4] notes: print note blob to stdout directly

Maarten Bosmans <mkbosmans@xxxxxxxxx> · Sat, 17 Feb 2024 13:45:39 +0100

Op do 15 feb 2024 om 16:04 schreef Jeff King <peff@xxxxxxxx>:
>
> On Thu, Feb 15, 2024 at 08:46:02AM +0100, Maarten Bosmans wrote:
>
> > > How about:
> > >
> > >   cat some_commit_ids |
> > >   git show --stdin -s -z --format='%H%n%N'
> > >
> > Wouldn't that fail horribly with non-text blobs?
>
> Yes, if you have NULs embedded in your notes then it won't work. Any
> batch output format would require byte counts, then. If we wanted to add
> a feature to support that, I would suggest one of:
>
>   - teach the pretty-print formatter a new placeholder to output the
>     number of bytes in an element. Then you could do something like
>     "%H %(size:%N)%n%N", but it would be generally useful for other
>     cases, too.
>
>   - teach the pretty-print formatter a variant of %N that outputs only
>     the oid of the note, note the note content itself. And then you
>     could do something like:
>
>       git log --format='%(note:oid) %H' |
>       git cat-file --batch='%(objectname) %(objectsize) %(rest)'
>
>     to get the usual cat-file output of each note blob, but associated
>     with the commit it's attached to (the "%(rest)" placeholder for
>     cat-file just relays any text found after the object name of each
>     line). You might need to do some scripting between the two to handle
>     commits with no note.
>
> Of the two, I'd guess that the second one is a lot less work to
> implement (on the Git side; on the reading side it's a little more
> involved, but still should be a constant number of processes).

The second one is attractive for another reason than implementation
simplicity. While the first one offers more flexibility, the second
reuses the existing cat-file batch format, so the interface between
git and scripts is familiar and consistent.

> One variant of the second one is to use "git notes list". For example,
> you can get all notes via cat-file like this right now:
>
>   git notes list |
>   git cat-file --batch='%(objectname) %(objectsize) %(rest)'

So the cat-file batch output is suitable for blobs containing newline
or NUL characters. But I struggle a bit with what would be an easy way
of using this format in a shell script. Something with multiple read
and read -N commands reading from the output, I guess.
The git codebase has `extract_batch_output()` in t1006. This uses a
separate perl invocation to parse the cat-file output, which confirms
my suspicion there isn't a straight-forward way to do this in e.g.
just a bash script.

That was why my first steps were to accept that a launching a separate
process per note in a bash loop is a pretty clear and well understood
idiom in shell scripts and try to make the git part of that a bit more
efficient.

> You can get individual notes by asking for "git notes list <commit>",
> but it will only take one at a time. So another easy patch would be
> something like (indentation left funny to make the diff more readable):
>
> diff --git a/builtin/notes.c b/builtin/notes.c
> index e65cae0bcf..5fdad5fb8f 100644
> --- a/builtin/notes.c
> +++ b/builtin/notes.c
> @@ -446,22 +446,22 @@ static int list(int argc, const char **argv, const char *prefix)
>                 argc = parse_options(argc, argv, prefix, options,
>                                      git_notes_list_usage, 0);
>
> -       if (1 < argc) {
> -               error(_("too many arguments"));
> -               usage_with_options(git_notes_list_usage, options);
> -       }
> -
>         t = init_notes_check("list", 0);
>         if (argc) {
> -               if (repo_get_oid(the_repository, argv[0], &object))
> -                       die(_("failed to resolve '%s' as a valid ref."), argv[0]);
> +               retval = 0;
> +               while (*++argv) {
> +               if (repo_get_oid(the_repository, *argv, &object))
> +                       die(_("failed to resolve '%s' as a valid ref."), *argv);
>                 note = get_note(t, &object);
>                 if (note) {
> -                       puts(oid_to_hex(note));
> -                       retval = 0;
> +                       if (argc > 1)
> +                               printf("%s %s\n", oid_to_hex(note), oid_to_hex(&object));
> +                       else
> +                               puts(oid_to_hex(note));
>                 } else
> -                       retval = error(_("no note found for object %s."),
> +                       retval |= error(_("no note found for object %s."),
>                                        oid_to_hex(&object));
> +               }
>         } else
>                 retval = for_each_note(t, 0, list_each_note, NULL);
>
>
> That would allow:
>
>   git rev-list ... |
>   xargs git notes list |
>   git cat-file --batch='%(objectname) %(objectsize) %(rest)'
>
> We could even add a "--stdin" mode to avoid the use of xargs.

Yes, a --stdin mode for `git notes list` would be a useful building
block for scripting indeed.
I'll probably give it a try when this patch series succeeds.

Maarten