Re: [PATCH v5 5/6] rev-list: add list-objects filtering support

Jonathan Nieder <jrnieder@xxxxxxxxx> · Wed, 22 Nov 2017 12:08:53 -0800

Hi,

Jeff Hostetler wrote:

> Teach rev-list to use the filtering provided by the
> traverse_commit_list_filtered() interface to omit
> unwanted objects from the result.
>
> Object filtering is only allowed when one of the "--objects*"
> options are used.

micronit: the line widths seem to be uneven in these commit messages,
which is a bit distracting when reading.

> When the "--filter-print-omitted" option is used, the omitted
> objects are printed at the end.  These are marked with a "~".
> This option can be combined with "--quiet" to get a list of
> just the omitted objects.

Neat.  Can you give a quick example?

Using --quiet for this feels a bit odd, since it previously meant
to print nothing to stdout.  I wonder if there's another way ---
e.g.

	--print-omitted=(yes|no|only)

If I wanted to list all objects matching a filter, even objects
that are not reachable from any ref, is there a way to do that?
(Just curious, trying to think about this interface.)

> Add t6112 test.

This part doesn't need to be in the commit message.  More generally,
anything I could more easily learn from the code or diffstat doesn't
need to be in the commit message: the commit message is about the
"why" more than the details of what in the code changed.

> In the future, we will introduce a "partial clone" mechanism
> wherein an object in a repo, obtained from a remote, may
> reference a missing object that can be dynamically fetched from
> that remote once needed.  This "partial clone" mechanism will
> have a way, sometimes slow, of determining if a missing link
> is one of the links expected to be produced by this mechanism.

Does this mean the <filter-spec>s will be part of the wire protocol?
I'll look more carefully at them below with that in mind.

> This patch introduces handling of missing objects to help
> debugging and development of the "partial clone" mechanism,
> and once the mechanism is implemented, for a power user to
> perform operations that are missing-object aware without
> incurring the cost of checking if a missing link is expected.

I had trouble understanding what this paragraph is about.  Can you
give an example?

> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> ---
>  Documentation/git-rev-list.txt      |   4 +-
>  Documentation/rev-list-options.txt  |  36 ++++++
>  builtin/rev-list.c                  | 108 ++++++++++++++++-
>  t/t6112-rev-list-filters-objects.sh | 225 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 370 insertions(+), 3 deletions(-)
>  create mode 100755 t/t6112-rev-list-filters-objects.sh

Looks reasonably concise, good.

[...]
> --- a/Documentation/git-rev-list.txt
> +++ b/Documentation/git-rev-list.txt
> @@ -47,7 +47,9 @@ SYNOPSIS
>  	     [ --fixed-strings | -F ]
>  	     [ --date=<format>]
>  	     [ [ --objects | --objects-edge | --objects-edge-aggressive ]
> -	       [ --unpacked ] ]
> +	       [ --unpacked ]
> +	       [ --filter=<filter-spec> [ --filter-print-omitted ] ] ]

Does this mean --filter is only useful with --objects?  E.g. I can't
use it to filter commits?

> +	     [ --missing=<missing-action> ]

--missing=(error|allow-any|print) would be more informative and about
equally concise.

Since this is mainly for debugging, does it have a different
compatibility guarantee from other options?  Could it be named
accordingly to set expectations?

[...]
> +The form '--filter=blob:none' omits all blobs.

Sounds sensible.

> ++
> +The form '--filter=blob:limit=<n>[kmg]' omits blobs larger than n bytes
> +or units.  The value may be zero.

On second thought, doesn't blob:limit=0 mean blob:none is not needed?
Is it for future consistency with tree:none?

What units do [kmg] use? Are they GB, GiB, or one of the variants in
between?

> ++
> +The form '--filter=sparse:oid=<oid-ish>' uses a sparse-checkout
> +specification contained in the object (or the object that the expression
> +evaluates to) to omit blobs that would not be not required for a
> +sparse checkout on the requested refs.

This one makes me a little nervous because it would mean we're
planning on adding sparse-checkout specifications to the wire
protocol.  Maybe that's okay --- they're already part of the on-disk
format --- but it makes me nervous because the sparse-checkout format
is not so great, as I believe MS has already noticed.

What is an <oid-ish>?  Can it just say <blob>?  How would this one
work when passed over the wire?

> ++
> +The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
> +specification contained in <path>.

Is this <path> relative to the cwd of the caller, or is it within some
commit?

Sorry it took so long to send this feedback / these questions.
Hopefully it's useful nevertheless.

Thanks and hope that helps,
Jonathan