On Mon, Jan 31, 2022 at 9:34 PM John Cai <johncai86@xxxxxxxxx> wrote: > > This RFC patch proposes a new flag --batch-command that works with > git-cat-file --batch. The subject is "Re: [RFC v3] cat-file: add a --stdin-cmd mode" and now you are talking about '--batch-command' instead of '--stdin-cmd'. "that works with git-cat-file --batch" is not very clear. Maybe you could find a wording that explains better how --batch-command is different from --batch. Also I think at this point this should probably not be an RFC patch anymore but a regular one. > Similar to git-update-ref --stdin, it will accept > commands and arguments from stdin. > > The start of this idea was discussed in [1], where the original > motivation was to be able to control when the buffer was flushed to > stdout in --buffer mode. That would be nice in a cover letter but I am not sure a commit message is the right place for this. > However, this can actually be much more useful in situations when > git-cat-file --batch is being used as a long lived backend query > process. At GitLab, we use a pair of cat-file processes. One for > iterating over object metadata with --batch-check, and the other to grab > object contents with --batch. However, if we had --batch-command, we could > get rid of the second --batch-check process, Maybe s/second// would make it clear that there are no two --batch-command processes. > and just have one process > where we can flip between getting object info, and getting object contents. > This can lead to huge savings since on a given server there could be hundreds to > thousands of git cat-file processes at a time. It's not clear if all the git cat-file processes you are talking about are mostly --batch-check processes or --batch processes, or a roughly equal amount of both. My guess is the latter and that --batch-command would mean that there would be around two times fewer cat-file processes. > git cat-file --batch-command > > $ <command> [arg1] [arg2] NL It's a bit unclear what the 2 above lines mean. Maybe you could add a small explanation like for example "The new flag can be used like this:" and "It receives commands from stdin in the format:" Also not sure why there is a '$' char in front of '<command> [arg1] [arg2] NL' but not in front of 'git cat-file --batch-command'. It doesn't look like in the 'git update-ref --stdin' doc that '$' are used in front of the commands that can be passed through stdin. > This patch adds three commands: object, info, fflush Maybe s/three commands/the following first three commands/ > $ object <sha1> NL > $ info <sha1> NL > $ fflush NL Idem about '$'. > These three would be immediately useful in GitLab's context, but one can > imagine this mode to be further extended for other things. Not very clear which "mode" you are talking about. You have been talking about a mode only in the subject so far. Maybe you should talk a bit about that above when '<command> [arg1] [arg2] NL' is introduced. Also you don't talk about the output format. --batch and --batch-check accept [=<format>], but it looks like --batch-command doesn't. > Future improvements: > - a non-trivial part of "cat-file --batch" time is spent > on parsing its argument and seeing if it's a revision, ref etc. So we > could add a command that only accepts a full-length 40 > character SHA-1. In a cover letter that would be ok, but I am not sure that a commit message is the best place for this kind of details about future work. > This would be the first step in adding such an interface to > git-cat-file. > > [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@xxxxxxxxx/ > > Helped-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > Signed-off-by: John Cai <johncai86@xxxxxxxxx> > --- > > Taylor, I'd be interested in your thoughts on this proposal since you helped > review the previous patch that turned into this RFC. Thanks! > > Changes from v2: > > - refactored tests to be within run_tests() > - added a test to test --buffer behavior with fflush > - cleaned up cat-file.c: clarified var names, removed unnecessary code > based on suggestions from Phillip Wood > - removed strvec changes > > Changes from v1: > > - changed option name to batch-command. > - changed command function interface to receive the whole line after the command > name to put the onus of parsing arguments to each individual command function. > - pass in whole line to batch_one_object in both parse_cmd_object and > parse_cmd_info to support spaces in the object reference. > - removed addition of -z to include in a separate patch series > - added documentation. > --- > Documentation/git-cat-file.txt | 15 +++++ > builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- > t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ > 3 files changed, 205 insertions(+), 7 deletions(-) > > diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt > index bef76f4dd0..254e546c79 100644 > --- a/Documentation/git-cat-file.txt > +++ b/Documentation/git-cat-file.txt > @@ -96,6 +96,21 @@ OPTIONS > need to specify the path, separated by whitespace. See the > section `BATCH OUTPUT` below for details. > > +--batch-command:: > + Enter a command mode that reads from stdin. Maybe s/a command mode that reads from stdin/a mode that reads commands from stdin/ Also I would expect something about the output, like perhaps "...and ouputs the command results to stdout". > May not be combined with any > + other options or arguments except `--textconv` or `--filters`, in which > + case the input lines also need to specify the path, separated by > + whitespace. See the section `BATCH OUTPUT` below for details. The BATCH OUTPUT section says that a format can be passed but that doesn't seem to be the case with --batch-command. So you might need to make some changes to that section too or add a bit more details about the output here. > +object <object>:: > + Print object contents for object reference <object> > + > +info <object>:: > + Print object info for object reference <object> > + > +flush:: > + Flush to stdout immediately when used with --buffer > + > --batch-all-objects:: > Instead of reading a list of objects on stdin, perform the > requested batch operation on all objects in the repository and > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 7b3f42950e..cc9e47943b 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -24,9 +24,11 @@ struct batch_options { > int buffer_output; > int all_objects; > int unordered; > - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ > + int mode; /* may be 'w' or 'c' for --filters or --textconv */ > const char *format; > + int command; > }; Maybe add a blank line here. > +static char line_termination = '\n'; > > static const char *force_path; > > @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > if (data->type == OBJ_BLOB) { > if (opt->buffer_output) > fflush(stdout); > - if (opt->cmdmode) { > + if (opt->mode) { The mechanical s/cmdmode/mode/g change could have been made in a preparatory patch to make this patch a bit smaller and easier to digest. > +static void batch_objects_command(struct batch_options *opt, > + struct strbuf *output, > + struct expand_data *data) > +{ > + struct strbuf input = STRBUF_INIT; > + > + /* Read each line dispatch its command */ > + while (!strbuf_getwholeline(&input, stdin, line_termination)) { > + int i; > + const struct parse_cmd *cmd = NULL; > + const char *p, *cmd_end; > + > + if (*input.buf == line_termination) > + die("empty command in input"); > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); > + > + for (i = 0; i < ARRAY_SIZE(commands); i++) { > + const char *prefix = commands[i].prefix; > + char c; > + if (!skip_prefix(input.buf, prefix, &cmd_end)) > + continue; > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ > + c = commands[i].takes_args ? ' ' : line_termination; > + if (input.buf[strlen(prefix)] != c) > + die("arguments invalid for command: %s", commands[i].prefix); > + > + cmd = &commands[i]; > + if (cmd->takes_args) { > + p = cmd_end + 1; > + // strip newline before handing it to the > + // handling function So above the /* */ comments delimiters are used but here // is used. I am not sure we support // these days, but if we do, I think it would be better to avoid mixing comment styles in the same function. > + input.buf[strcspn(input.buf, "\n")] = '\0'; > + } > + > + break; > + } > + > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + cmd->fn(opt, p, output, data); > + } > + strbuf_release(&input); > +} > @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) > save_warning = warn_on_object_refname_ambiguity; > warn_on_object_refname_ambiguity = 0; > > + if (command) > + batch_objects_command(opt, &output, &data); > + > while (strbuf_getline(&input, stdin) != EOF) { I think batch_objects_command() will consume everything from stdin, so it doesn't make sense to try to read again from stdin after it. Maybe the whole while (...) { ... } clause should be inside an else clause or something. > if (data.split_on_whitespace) { > /* > @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, > > bo->enabled = 1; > bo->print_contents = !strcmp(opt->long_name, "batch"); > + bo->command = !strcmp(opt->long_name, "batch-command"); > bo->format = arg; > > return 0;