Re: [PATCH v2 2/2] cat-file: add --batch-command mode

Phillip Wood <phillip.wood123@xxxxxxxxx> · Tue, 8 Feb 2022 11:00:21 +0000

Hi Jonathan and John

On 07/02/2022 23:34, Jonathan Tan wrote:
"John Cai via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings since on a given server.

One other benefit is that with explicit flushes, in a partial clone,
this makes it possible to batch prefetch objects.

Jonathan is there any overlap between what this series is trying to do 
and your proposal for a batch command[1]? For example would extending 
this series to get blob sizes be useful to you?

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@xxxxxxxxxx/

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..618dbd15338 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,25 @@ OPTIONS
  	need to specify the path, separated by whitespace.  See the
  	section `BATCH OUTPUT` below for details.
  
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin.
+	May not be combined with any other options or arguments except
+	`--textconv` or `--filters`, in which case the input lines also need to
+	specify the path, separated by whitespace.  See the section
+	`BATCH OUTPUT` below for details.
+
+contents <object>::
+	Print object contents for object reference <object>
+
+info <object>::
+	Print object info for object reference <object>
+
+flush::
+	Execute all preceding commands that were issued since the beginning or
+	since the last flush command was issued. Only used with --buffer. When
+	--buffer is not used, commands are flushed each time without issuing
+	`flush`.

The way this is formatted leads me to think that "contents", etc. are
CLI arguments, not things written to stdin. Some of the commit message
probably needs to go here.

I just looked at the commit message and documentation for now.

If you have time and are interested, we at Google are thinking of a more
comprehensive "batch" process [1].

[1] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@xxxxxxxxxx/