Re: [PATCH 2/5] cat-file: mention --unordered along with --batch-all-objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 05 2021, Jeff King wrote:

> On Tue, Oct 05, 2021 at 11:02:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>> > index 4eb0421b3f..6467707c6e 100644
>> > --- a/Documentation/git-cat-file.txt
>> > +++ b/Documentation/git-cat-file.txt
>> > @@ -94,8 +94,9 @@ OPTIONS
>> >  	Instead of reading a list of objects on stdin, perform the
>> >  	requested batch operation on all objects in the repository and
>> >  	any alternate object stores (not just reachable objects).
>> > -	Requires `--batch` or `--batch-check` be specified. Note that
>> > -	the objects are visited in order sorted by their hashes.
>> > +	Requires `--batch` or `--batch-check` be specified. By default,
>> > +	the objects are visited in order sorted by their hashes; see
>> > +	also `--unordered` below.
>> >  
>> >  --buffer::
>> >  	Normally batch output is flushed after each object is output, so
>> 
>> Since you're doing while-you're-at-it anyway: Isn't the --unordered
>> documentation also incorrect to reference --batch, which I take as it
>> lazily using as a shorthand for --batch-all-objects.
>
> I don't think so. It says:
>
> --unordered::
>         When `--batch-all-objects` is in use, visit objects in an
>         order which may be more efficient for accessing the object
>         contents than hash order. The exact details of the order are
>         unspecified, but if you do not require a specific order, this
>         should generally result in faster output, especially with
>         `--batch`.  Note that `cat-file` will still show each object
>         only once, even if it is stored multiple times in the
>         repository.
>
> So it correctly mentions that it is affecting --batch-all-objects in the
> first sentence. The "especially with --batch" is correct, too. The
> ordering has more of an effect if you are accessing the full object,
> since there we are increasing the locality which the delta-base cache
> relies on.  Whereas with --batch-check, even with size or type, that
> locality is much less important (it might help disk or even RAM caches a
> bit, but we are examining each object independently, even if it's a
> delta, and not caching the intermediate results in any way ourselves).
>
> Does that answer your question, or were you thinking of something else?

Urgh.

This is the Nth time I've completely failed to mentally model how this
command works, after having hacked on it extensively.

I thought that --batch-all-objects and --batch were mutually exclusive,
but they're obviously not.

In my defense I think the help/code is very confusing. I hacked up some
WIP changes to change it from:
    
    $ git cat-file -h
    usage: git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>
       or: git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]
    
    <type> can be one of: blob, tree, commit, tag
        -t                    show object type
        -s                    show object size
        -e                    exit with zero when there's no error
        -p                    pretty-print object's content
        --textconv            for blob objects, run textconv on object's content
        --filters             for blob objects, run filters on object's content
        --path <blob>         use a specific path for --textconv/--filters
        --allow-unknown-type  allow -s and -t to work with broken/corrupt objects
        --buffer              buffer --batch output
        --batch[=<format>]    show info and content of objects fed from the standard input
        --batch-check[=<format>]
                              show info about objects fed from the standard input
        --follow-symlinks     follow in-tree symlinks (used with --batch or --batch-check)
        --batch-all-objects   show all objects with --batch or --batch-check
        --unordered           do not order --batch-all-objects output
    
To:
    
    usage: git cat-file (-e | -p) <object>
       or: git cat-file ( -t | -s ) [--allow-unknown-type] <object>
       or: git cat-file [--batch-all-objects] [--batch | --batch-check] [--buffer] [--follow-symlinks] [--unordered]
       or: git cat-file [--textconv|--filters] [--path=<path|tree-ish> <rev> | <rev>:<path|tree-ish>]
    
    Check object existence or emit object contents
        -e                    check if <object> exists
        -p                    pretty-print <object> content
    
    Emit [broken] object attributes
        -t                    show object type (one of 'blob', 'tree', 'commit', 'tag', ...)
        -s                    show object size
        --allow-unknown-type  allow -s and -t to work with broken/corrupt objects
    
    Run <rev>:<blobs|tree> via conversion or filter
        --textconv            for blob objects, run textconv on object's content
        --filters             for blob objects, run filters on object's content
        --path <blob>         use a specific path for --textconv/--filters
    
    Emit objects in batch via requests on STDIN, or --batch-all-objects
        --batch-all-objects   Emit all objects in the repository, instead of taking requests on STDIN
        --buffer              buffer --batch output
        --batch[=<format>]    show info and content of objects fed from the standard input
        --batch-check[=<format>]
                              show info about objects fed from the standard input
        --follow-symlinks     follow in-tree symlinks
        --unordered           do not order objects before emitting them

Which actually reflects reality, i.e. half of the options aren't
accepted by the batch mode, so grouping them makes sense, and the
current help gives the impression that e.g. --textconv can be used with
--batch, but it's effectively a CMDMODE.

Then we don't document that [--textconv|--filters] will fallback to -p
if given <tree-ish>, I'm not sure if that's intentional (but a
"fallthrough" comment suggests so).

And we silentnly accept --buffer etc. without the batch mode, but it
should die.

Looking at the history it seems you added --batch-all-objects around the
same time as the OPT_CMDMODE() was used in the command, so we should
probably have something like this to start with:

-- >8 --
Subject: [PATCH] cat-file: make --batch-all-objects a CMDMODE

The usage of OPT_CMDMODE() in "cat-file"[1] was added in parallel with
the development of[3] the --batch-all-objects option[4], so we've
since grown[5] checks that it can't be combined with other command
modes, when it should just be made a top-level command-mode
instead. It doesn't combine with --filters, --textconv etc.

1. b48158ac94c (cat-file: make the options mutually exclusive, 2015-05-03)
2. https://lore.kernel.org/git/xmqqtwspgusf.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
3. https://lore.kernel.org/git/20150622104559.GG14475@xxxxxxxx/
4. 6a951937ae1 (cat-file: add --batch-all-objects option, 2015-06-22)
5. 321459439e1 (cat-file: support --textconv/--filters in batch mode, 2016-09-09)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
---
 builtin/cat-file.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 243fe6844bc..50861bb02da 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -643,6 +643,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			    N_("for blob objects, run textconv on object's content"), 'c'),
 		OPT_CMDMODE(0, "filters", &opt,
 			    N_("for blob objects, run filters on object's content"), 'w'),
+		OPT_CMDMODE(0, "batch-all-objects", &opt,
+			    N_("show all objects with --batch or --batch-check"), 'b'),
 		OPT_STRING(0, "path", &force_path, N_("blob"),
 			   N_("use a specific path for --textconv/--filters")),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
@@ -658,8 +660,6 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			batch_option_callback),
 		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
 			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
-		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
-			 N_("show all objects with --batch or --batch-check")),
 		OPT_BOOL(0, "unordered", &batch.unordered,
 			 N_("do not order --batch-all-objects output")),
 		OPT_END()
@@ -669,28 +669,25 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 
 	batch.buffer_output = -1;
 	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
-
-	if (opt) {
+	if (argc && batch.enabled)
+		usage_with_options(cat_file_usage, options);
+	if (opt == 'b') {
+		batch.all_objects = 1;
+	} else if (opt) {
 		if (batch.enabled && (opt == 'c' || opt == 'w'))
 			batch.cmdmode = opt;
 		else if (argc == 1)
 			obj_name = argv[0];
 		else
 			usage_with_options(cat_file_usage, options);
-	}
-	if (!opt && !batch.enabled) {
+	} else if (!opt && !batch.enabled) {
 		if (argc == 2) {
 			exp_type = argv[0];
 			obj_name = argv[1];
 		} else
 			usage_with_options(cat_file_usage, options);
-	}
-	if (batch.enabled) {
-		if (batch.cmdmode != opt || argc)
-			usage_with_options(cat_file_usage, options);
-		if (batch.cmdmode && batch.all_objects)
-			die("--batch-all-objects cannot be combined with "
-			    "--textconv nor with --filters");
+	} else if (batch.enabled && batch.cmdmode != opt) {
+		usage_with_options(cat_file_usage, options);
 	}
 
 	if ((batch.follow_symlinks || batch.all_objects) && !batch.enabled) {
-- 
2.33.0.1441.gbbcdb4c3c66




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux