Hi Christian On 1 Feb 2022, at 4:39, Christian Couder wrote: > On Mon, Jan 31, 2022 at 9:34 PM John Cai <johncai86@xxxxxxxxx> wrote: >> >> This RFC patch proposes a new flag --batch-command that works with >> git-cat-file --batch. > > The subject is "Re: [RFC v3] cat-file: add a --stdin-cmd mode" and now > you are talking about '--batch-command' instead of '--stdin-cmd'. > > "that works with git-cat-file --batch" is not very clear. Maybe you > could find a wording that explains better how --batch-command is > different from --batch. > > Also I think at this point this should probably not be an RFC patch > anymore but a regular one. > >> Similar to git-update-ref --stdin, it will accept >> commands and arguments from stdin. >> >> The start of this idea was discussed in [1], where the original >> motivation was to be able to control when the buffer was flushed to >> stdout in --buffer mode. > > That would be nice in a cover letter but I am not sure a commit > message is the right place for this. > >> However, this can actually be much more useful in situations when >> git-cat-file --batch is being used as a long lived backend query >> process. At GitLab, we use a pair of cat-file processes. One for >> iterating over object metadata with --batch-check, and the other to grab >> object contents with --batch. However, if we had --batch-command, we could >> get rid of the second --batch-check process, > > Maybe s/second// would make it clear that there are no two > --batch-command processes. > >> and just have one process >> where we can flip between getting object info, and getting object contents. >> This can lead to huge savings since on a given server there could be hundreds to >> thousands of git cat-file processes at a time. > > It's not clear if all the git cat-file processes you are talking about > are mostly --batch-check processes or --batch processes, or a roughly > equal amount of both. My guess is the latter and that --batch-command > would mean that there would be around two times fewer cat-file > processes. > >> git cat-file --batch-command >> >> $ <command> [arg1] [arg2] NL > > It's a bit unclear what the 2 above lines mean. Maybe you could add a > small explanation like for example "The new flag can be used like > this:" and "It receives commands from stdin in the format:" > > Also not sure why there is a '$' char in front of '<command> [arg1] > [arg2] NL' but not in front of 'git cat-file --batch-command'. It > doesn't look like in the 'git update-ref --stdin' doc that '$' are > used in front of the commands that can be passed through stdin. > >> This patch adds three commands: object, info, fflush > > Maybe s/three commands/the following first three commands/ > >> $ object <sha1> NL >> $ info <sha1> NL >> $ fflush NL > > Idem about '$'. > >> These three would be immediately useful in GitLab's context, but one can >> imagine this mode to be further extended for other things. > > Not very clear which "mode" you are talking about. You have been > talking about a mode only in the subject so far. Maybe you should talk > a bit about that above when '<command> [arg1] [arg2] NL' is > introduced. > > Also you don't talk about the output format. --batch and --batch-check > accept [=<format>], but it looks like --batch-command doesn't. > >> Future improvements: >> - a non-trivial part of "cat-file --batch" time is spent >> on parsing its argument and seeing if it's a revision, ref etc. So we >> could add a command that only accepts a full-length 40 >> character SHA-1. > > In a cover letter that would be ok, but I am not sure that a commit > message is the best place for this kind of details about future work. > >> This would be the first step in adding such an interface to >> git-cat-file. >> >> [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@xxxxxxxxx/ >> >> Helped-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> >> Signed-off-by: John Cai <johncai86@xxxxxxxxx> >> --- >> >> Taylor, I'd be interested in your thoughts on this proposal since you helped >> review the previous patch that turned into this RFC. Thanks! >> >> Changes from v2: >> >> - refactored tests to be within run_tests() >> - added a test to test --buffer behavior with fflush >> - cleaned up cat-file.c: clarified var names, removed unnecessary code >> based on suggestions from Phillip Wood >> - removed strvec changes >> >> Changes from v1: >> >> - changed option name to batch-command. >> - changed command function interface to receive the whole line after the command >> name to put the onus of parsing arguments to each individual command function. >> - pass in whole line to batch_one_object in both parse_cmd_object and >> parse_cmd_info to support spaces in the object reference. >> - removed addition of -z to include in a separate patch series >> - added documentation. >> --- >> Documentation/git-cat-file.txt | 15 +++++ >> builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- >> t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ >> 3 files changed, 205 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt >> index bef76f4dd0..254e546c79 100644 >> --- a/Documentation/git-cat-file.txt >> +++ b/Documentation/git-cat-file.txt >> @@ -96,6 +96,21 @@ OPTIONS >> need to specify the path, separated by whitespace. See the >> section `BATCH OUTPUT` below for details. >> >> +--batch-command:: >> + Enter a command mode that reads from stdin. > > Maybe s/a command mode that reads from stdin/a mode that reads > commands from stdin/ > > Also I would expect something about the output, like perhaps "...and > ouputs the command results to stdout". > >> May not be combined with any >> + other options or arguments except `--textconv` or `--filters`, in which >> + case the input lines also need to specify the path, separated by >> + whitespace. See the section `BATCH OUTPUT` below for details. > > The BATCH OUTPUT section says that a format can be passed but that > doesn't seem to be the case with --batch-command. So you might need to > make some changes to that section too or add a bit more details about > the output here. Thinking about this more, I wonder if it'd be worth it to allow the --batch-command=<format> for backwards compatibility reasons for users who switch over to using --batch-command from --batch and --batch-check. The only thing that makes me hesitant is that the <format> would only be relevant to the "info" and "object" commands instead of being relevant to all commands in --batch-command mode. > >> +object <object>:: >> + Print object contents for object reference <object> >> + >> +info <object>:: >> + Print object info for object reference <object> >> + >> +flush:: >> + Flush to stdout immediately when used with --buffer >> + >> --batch-all-objects:: >> Instead of reading a list of objects on stdin, perform the >> requested batch operation on all objects in the repository and >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index 7b3f42950e..cc9e47943b 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -24,9 +24,11 @@ struct batch_options { >> int buffer_output; >> int all_objects; >> int unordered; >> - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ >> + int mode; /* may be 'w' or 'c' for --filters or --textconv */ >> const char *format; >> + int command; >> }; > > Maybe add a blank line here. > >> +static char line_termination = '\n'; >> >> static const char *force_path; >> >> @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> if (data->type == OBJ_BLOB) { >> if (opt->buffer_output) >> fflush(stdout); >> - if (opt->cmdmode) { >> + if (opt->mode) { > > The mechanical s/cmdmode/mode/g change could have been made in a > preparatory patch to make this patch a bit smaller and easier to > digest. > >> +static void batch_objects_command(struct batch_options *opt, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + struct strbuf input = STRBUF_INIT; >> + >> + /* Read each line dispatch its command */ >> + while (!strbuf_getwholeline(&input, stdin, line_termination)) { >> + int i; >> + const struct parse_cmd *cmd = NULL; >> + const char *p, *cmd_end; >> + >> + if (*input.buf == line_termination) >> + die("empty command in input"); >> + else if (isspace(*input.buf)) >> + die("whitespace before command: %s", input.buf); >> + >> + for (i = 0; i < ARRAY_SIZE(commands); i++) { >> + const char *prefix = commands[i].prefix; >> + char c; >> + if (!skip_prefix(input.buf, prefix, &cmd_end)) >> + continue; >> + /* >> + * If the command has arguments, verify that it's >> + * followed by a space. Otherwise, it shall be followed >> + * by a line terminator. >> + */ >> + c = commands[i].takes_args ? ' ' : line_termination; >> + if (input.buf[strlen(prefix)] != c) >> + die("arguments invalid for command: %s", commands[i].prefix); >> + >> + cmd = &commands[i]; >> + if (cmd->takes_args) { >> + p = cmd_end + 1; >> + // strip newline before handing it to the >> + // handling function > > So above the /* */ comments delimiters are used but here // is used. I > am not sure we support // these days, but if we do, I think it would > be better to avoid mixing comment styles in the same function. > >> + input.buf[strcspn(input.buf, "\n")] = '\0'; >> + } >> + >> + break; >> + } >> + >> + if (!cmd) >> + die("unknown command: %s", input.buf); >> + >> + cmd->fn(opt, p, output, data); >> + } >> + strbuf_release(&input); >> +} > >> @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) >> save_warning = warn_on_object_refname_ambiguity; >> warn_on_object_refname_ambiguity = 0; >> >> + if (command) >> + batch_objects_command(opt, &output, &data); >> + >> while (strbuf_getline(&input, stdin) != EOF) { > > I think batch_objects_command() will consume everything from stdin, so > it doesn't make sense to try to read again from stdin after it. Maybe > the whole while (...) { ... } clause should be inside an else clause > or something. > >> if (data.split_on_whitespace) { >> /* >> @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, >> >> bo->enabled = 1; >> bo->print_contents = !strcmp(opt->long_name, "batch"); >> + bo->command = !strcmp(opt->long_name, "batch-command"); >> bo->format = arg; >> >> return 0;